Table 1.
Corpus | With stop words | Without stop words | Segments |
---|---|---|---|
Clinical |
∼42.5M tokens |
∼22.5M tokens |
268,727 documents |
|
(∼0.4M types) |
(∼0.4M types) |
|
Medical |
∼20.3M tokens |
∼12.1M tokens |
1,153,824 sentences |
(∼0.3M types) | (∼0.3M types) |
The number of tokens and unique terms (word types) in the medical and clinical corpus, with and without stop words.