Skip to main content
. 2023 Mar 1;23(5):2708. doi: 10.3390/s23052708

Table 6.

Tokens, vocabulary, and TTR.

Preprocessing Steps No. of Tokens Vocabulary Size % of Tokens in Vocabulary TTR
After tokenization 1,167,630 89,696 7.681885529 0.077
After stopwords removal 870,521 89,003 10.22410717 0.102
After punctuation removal 746,292 889.87 11.92388502 0.119
Alphanumeric to alphabetic word 746,292 86,271 11.5599524 0.116
After single-letter word removal 620,133 86,098 13.8837959 0.139
After lemmatization 620,133 50,043 8.069720528 0.081