Skip to main content
. 2014 Feb 5;5:6. doi: 10.1186/2041-1480-5-6

Table 1.

Corpora statistics

Corpus With stop words Without stop words Segments
Clinical
∼42.5M tokens
∼22.5M tokens
268,727 documents
 
(∼0.4M types)
(∼0.4M types)
 
Medical
∼20.3M tokens
∼12.1M tokens
1,153,824 sentences
  (∼0.3M types) (∼0.3M types)  

The number of tokens and unique terms (word types) in the medical and clinical corpus, with and without stop words.