Table 2.
Overview of relation extraction datasets
Dataset name | Published | Data source | Scale/data size | Relation objects | Relation types | Reference |
---|---|---|---|---|---|---|
IEPA | 2002 | PubMed | 486 sentences/ ~300 abstracts |
Chemicals | Binary interaction | [13] |
AIMed | 2005 | MEDLINE | 1955 sentences/ 225 abstracts |
Human protein/gene | Binary interaction | [15] |
LLL | 2005 | MEDLINE | 77 sentences | Protein/gene in Bacillus subtilis | 3 types | [14] |
Bioinfer | 2007 | PubMed | 1100 sentences | Protein/gene/RNA and related | 68 types | [16] |
HPRD50 | 2007 | MEDLINE | 145 sentences/ 50 abstracts |
Human protein/gene | Binary interaction | [17] |
EMU | 2010 | PubMed | 109 abstracts on mutation | Human protein/gene, disease (prostate cancer/breast cancer) | Binary interaction | [55] |
MLEE corpus | 2012 | PubMed | 2608 sentences/ 262 abstracts on angiogenesis |
Organism, anatomy, molecule types (14 entity types included in 3 major entity types) |
Anatomical, molecular, general, planned events (18 events included in 4 major types) |
[56] |
EU-ADR | 2012 | MEDLINE | 300 abstracts | Drug, disease, and target (gene, protein and sequence variation) | drug–disease, drug–target, target–disease | [57] |
ADE corpus | 2012 | MEDLINE | 20 967 sentences/ 2972 documents |
Drug, adverse effect and dosage | Drug–adverse effect, drug–dosage | [21] |
GAD corpus | 2015 | PubMed | 5329 sentences | Gene and disease | Binary interaction | [58] |
PhenoCHF | 2015 | i2b2 recognizing obesity challenge; PMC |
300 discharged summaries; 10 full-texts |
Six COPD-related mentions | 3 types | [59] |
BRONCO | 2016 | PMC | 108 full-texts | Variant, gene, disease, drug and cell-line | Variant with other entities (4 types) | [60] |
N-ary | 2017 | PMC | 264 867 sentences | Drug, gene, mutation | Six types (5 pos, 1 neg) | [61] |
DDAE dataset | 2019 | PubMed | 521 abstracts (400 for training, 121 for test) | Disease | 2 types | [62] |
RENET2 | 2021 | MEDLINE; PMC |
1000 abstracts; 500 full-texts |
Gene and disease | Associated, non-associated and ambiguous | [63] |