Skip to main content
. 2016 May 17;7:703. doi: 10.3389/fpsyg.2016.00703

Table 3.

Quantitative description of the different corpora used.

Corpora Number of types Number of tokens Average document size Numbers of documents
TASA 57,800 5,285,933 140.41 37,600
Wikipedia 66,035 7,015,782 175.39 40,000
Fiction 66,632 3,964,482 101.56 40,000
Non-Fiction 60,917 2,860,230 114.41 25,000
Mixed 81,349 13,134,480 131.35 100,000