Table 6. Size of vocabulary V (i.e., number of types) when texts are decomposed in different sorts of types, being these: word-lemma-tag (w-l-t), plain words, lemma-POS (l-pos), lemma-POS of words in the dictionary (l-pos dic), lemmas, and lemmas of words in the dictionary (lemma dic).
w-l-t | word | l-pos | l-pos dic | lemma | lemma dic | |
---|---|---|---|---|---|---|
Clarissa | 23624 | 20492 | 17058 | 10315 | 15356 | 9041 |
Moby-Dick | 20777 | 18516 | 15774 | 10426 | 14226 | 9141 |
Ulysses | 32952 | 29450 | 26412 | 14136 | 24089 | 12469 |
Don Quijote | 23359 | 21180 | 11872 | 7906 | 11128 | 7432 |
La Regenta | 24053 | 21871 | 12509 | 10500 | 11768 | 9900 |
Artamène | 31574 | 25161 | 7605 | 5349 | 7177 | 5008 |
Bragelonne | 28803 | 25775 | 12994 | 11342 | 12127 | 10744 |
Seitsemän | 22851 | 22035 | 9749 | 7788 | 9607 | 7658 |
Kevät ja | 26087 | 25071 | 9897 | 9054 | 9733 | 8898 |
Vanhempieni | 37247 | 35931 | 14751 | 13678 | 14566 | 13510 |