Skip to main content
. Author manuscript; available in PMC: 2014 Dec 1.
Published in final edited form as: J Biomed Inform. 2013 Aug 15;46(6):10.1016/j.jbi.2013.08.004. doi: 10.1016/j.jbi.2013.08.004

Table 1.

Numbers of documents, sentences, and entities in the i2b2 and GENIA corpora.

Corpus # Documents # Sentences # Entities
i2b2-Pittsburgh 477 27,627 Problem: 12,586
Treatment: 9,343
Test: 9,225

i2b2-Beth 73 8,798 Problem: 4,187
Treatment: 3,072
Test: 3,036

i2b2-Partners 97 7,517 Problem: 2,885
Treatment: 1,768
Test: 1,570

GENIA 2,000 18,546 protein: 24,966
DNA: 8,557
RNA: 719
cell type: 6,221
cell line: 3,663