Skip to main content
. 2013 Mar 13;20(5):931–939. doi: 10.1136/amiajnl-2012-001453

Table 1.

Distributional corpus characteristics

Corpus No of sentences No of tokens No of unique words
WSJ non-clinical corpus 8887 210 413 17 196
IHC clinical corpus 524 10 233 2442
Pitt clinical corpus 1036 12 227 2100
Combined IHC and Pitt clinical corpus 1560 22 460 3558

IHC, Intermountain Healthcare; Pitt, University of Pittsburgh; WSJ, Wall Street Journal.