. Author manuscript; available in PMC: 2023 Sep 1.

Published in final edited form as: Appl Neuropsychol Adult. 2020 Dec 30;29(5):1250–1257. doi: 10.1080/23279095.2020.1864733

Table 3.

Lexical-semantic features that were calculated on the transcripts of spontaneous speech produced by participants retelling the Cinderella story.

Feature name	Description
Total words	Overall count of all phonological entities spoken; including real words, nonwords, and partial words
Filler words	Count of filled pauses (e.g., “uh”, “um”, “hmm”), as a percentage of total word count
Empty words	Count of empty words (e.g., “thing”, “place”, “stuff”), as a percentage of total word count
Lexical frequency	Mean of the log of the frequency of all real words spoken
Type-token ratio	Ratio of unique words (types) to total words (tokens) spoken, used as a measure of vocabulary size and lexical diversity; higher values means the speaker produced a more varied vocabulary
Honoré’s statistic	Measure of lexical richness/diversity based on the number of words produced exactly once; higher values mean more diverse speech. It is calculated as: (100 * log(tokens)) / (1 - V₁/types), where V₁=number of words spoken exactly once
Brunet’s index	Measure of lexical richness (i.e., degree of variation in vocabulary), which is less biased by text length, calculated from the total number of words produced (tokens) and the number of unique words (types); lower values mean richer speech. It is calculated as: tokens ^ types ^ (−0.165)
Speech rate	Count of total words divided by total elapsed time of the speech (in words per second)
Filler rate	Count of filler words divided by total elapsed time of the speech (in words per second)
Definite articles	Count of uses of “the”, as a percentage of total word count
Indefinites articles	Count of uses of “a” and “an”, as a percentage of total word count
Pronouns	Count of pronouns, as a percentage of total word count
Nouns	Count of nouns, as a percentage of total word count
Verbs	Count of verbs, as a percentage of total word count
Determiners	Count of determiners, as a percentage of total word count
Content words	All words that are not function words (as defined by the list of stop words in NLTK), as a percentage of total word count