Skip to main content
. 2009 Dec 9;10:403. doi: 10.1186/1471-2105-10-403

Table 1.

Characteristic of corpora

AIMed GENETAG GENIA
Size abstracts 225 1,999
sentences 1,987 10,000 18,554

Entity scope human P/G P/G/R human P/G/R
number 4,075 11,739 34,264(P)/10,002(G)/944(R)
coverage specific occurrence specific occurrence all occurrences
type no no Ontology
7 types(P)/5 types(G)/5 types(R)

Legend:

Size: Number of abstracts or sentences in the corpus used in this research

Entity scope: Types of the named entities identified in the corpus: (P)rotein, (G)ene, (R)NA

Entity number: Number of the annotated in-scope entities in the corpus

Entity coverage: Coverage of in-scope entity occurrences in each sentence

Entity type: Explicit identification of the types of the annotated in-scope entities