Table 2. Three corpora.
Corpus | Description | Size in words |
Medicine | Clinical journal article abstracts from PubMed database | 113,007,884 |
Novels | 19th century literature—written in or translated to English | 10,099,229 |
News | The Reuters corpus containing news stories published between August 20, 1996 and August 19, 1997 | 207,833,336 |