Skip to main content
. 2015 Jul 9;10(7):e0129031. doi: 10.1371/journal.pone.0129031

Table 5. Coverage of the vocabulary by the dictionary in each language, both at the word-type and at the token level.

The average for all texts is also included. Remember that we distinguish between a word type (corresponding to its orthographic form) and its tokens (actual occurrences in text).

Title Tokens Types
Clarissa 96.9% 68.0%
Moby-Dick 94.7% 70.8%
Ulysses 90.4% 58.6%
Don Quijote 97.0% 81.3%
La Regenta 97.9% 89.5%
Artamène 83.6% 43.6%
Bragelonne 97.5% 89.8%
Seitsemän v. 95.4% 89.8%
Kevät ja t. 98.3% 96.2%
Vanhempieni r. 98.5% 96.5%
average 95.0% 78.4%