Skip to main content
. Author manuscript; available in PMC: 2023 Sep 1.
Published in final edited form as: Appl Neuropsychol Adult. 2020 Dec 30;29(5):1250–1257. doi: 10.1080/23279095.2020.1864733

Table 3.

Lexical-semantic features that were calculated on the transcripts of spontaneous speech produced by participants retelling the Cinderella story.

Feature name Description
Total words Overall count of all phonological entities spoken; including real words, nonwords, and partial words
Filler words Count of filled pauses (e.g., “uh”, “um”, “hmm”), as a percentage of total word count
Empty words Count of empty words (e.g., “thing”, “place”, “stuff”), as a percentage of total word count
Lexical frequency Mean of the log of the frequency of all real words spoken
Type-token ratio Ratio of unique words (types) to total words (tokens) spoken, used as a measure of vocabulary size and lexical diversity; higher values means the speaker produced a more varied vocabulary
Honoré’s statistic Measure of lexical richness/diversity based on the number of words produced exactly once; higher values mean more diverse speech. It is calculated as: (100 * log(tokens)) / (1 - V1/types), where V1=number of words spoken exactly once
Brunet’s index Measure of lexical richness (i.e., degree of variation in vocabulary), which is less biased by text length, calculated from the total number of words produced (tokens) and the number of unique words (types); lower values mean richer speech. It is calculated as: tokens ^ types ^ (−0.165)
Speech rate Count of total words divided by total elapsed time of the speech (in words per second)
Filler rate Count of filler words divided by total elapsed time of the speech (in words per second)
Definite articles Count of uses of “the”, as a percentage of total word count
Indefinites articles Count of uses of “a” and “an”, as a percentage of total word count
Pronouns Count of pronouns, as a percentage of total word count
Nouns Count of nouns, as a percentage of total word count
Verbs Count of verbs, as a percentage of total word count
Determiners Count of determiners, as a percentage of total word count
Content words All words that are not function words (as defined by the list of stop words in NLTK), as a percentage of total word count