JNLPBA (Joint Workshop on NLP in Biomedicine and Its Applications) [18] |
Gene/protein concept extraction |
The corpus consists of 2,000 PubMed abstracts as training data and 404 PubMed abstracts as test data. |
|
BioCreAtivE 2004 Task 1A dataset [19] |
Gene/protein concept extraction |
The corpus consists of 15,000 PubMed sentences as training data and 5,000 PubMed sentences as test data. |
|
BioCreAtivE 2 Gene Mention (GM) dataset [20] |
Gene/protein concept extraction |
The corpus consists of 15,000 PubMed sentences as training data and 5,000 PubMed sentences as test data. |
|
AIMED [21] |
Protein-protein interaction |
The corpus consists of 225 PubMed abstracts that contain 1,987 sentences with 4,075 protein mentions. |
|
HPRD50 (Human Protein Reference Database) [22] |
Protein-protein interaction |
The corpus consists of sentences with protein-protein interaction from 50 PubMed abstracts. |
|
BioInfer (Bio Information Extraction Resource) [23] |
Protein, gene, and RNA relationships |
The corpus consists of 1100 sentences annotated with concept names, relationships, and syntactic dependencies. |
|
IEPA (Interaction Extraction Performance Assessment) [24] |
Protein-protein interaction |
The corpus consists of more than 200 PubMed sentences annotated with protein-protein interaction. |
|
BioCreAtivE 2.5 Elsevier Corpus [25] |
Protein-protein interaction |
The corpus consists of 61 PubMed articles as training data and 62 PubMed articles as test data. |
|
BC4GO Corpus [26] |
Gene ontology |
The corpus consists of 1356 distinct GO terms from 200 PubMed articles. |
|
GREC Corpus [27] |
Gene regulation and gene expression events |
The corpus consists of 240 PubMed abstracts with annotations on gene regulation and gene expression events. |
|
GETM [28] |
Gene expression events |
The corpus consists of 150 PubMed abstracts with annotation for gene expression events. |
|
AnEM [29] |
Tissue, cell, developing anatomical structure, cellular component |
The corpus consists of 500 PubMed sentences with annotations on variety of biomedical concepts. |
|
CellFinder Corpus [30] |
Anatomical parts, cell lines, cell types, species, and cell components |
The corpus consists of annotations from 10 full-text PubMed articles. |