Total words |
Overall count of all phonological entities
spoken; including real words, nonwords, and partial words |
Filler words |
Count of filled pauses (e.g.,
“uh”, “um”, “hmm”), as a
percentage of total word count |
Empty words |
Count of empty words (e.g.,
“thing”, “place”, “stuff”), as
a percentage of total word count |
Lexical frequency |
Mean of the log of the frequency of all real
words spoken |
Type-token ratio |
Ratio of unique words (types) to total words
(tokens) spoken, used as a measure of vocabulary size and lexical
diversity; higher values means the speaker produced a more varied
vocabulary |
Honoré’s statistic |
Measure of lexical richness/diversity based on
the number of words produced exactly once; higher values mean more
diverse speech. It is calculated as: (100 * log(tokens)) / (1 -
V1/types), where V1=number of words spoken
exactly once |
Brunet’s index |
Measure of lexical richness (i.e., degree of
variation in vocabulary), which is less biased by text length,
calculated from the total number of words produced (tokens) and the
number of unique words (types); lower values mean richer speech. It is
calculated as: tokens ^ types ^ (−0.165) |
Speech rate |
Count of total words divided by total elapsed
time of the speech (in words per second) |
Filler rate |
Count of filler words divided by total elapsed
time of the speech (in words per second) |
Definite articles |
Count of uses of “the”, as a
percentage of total word count |
Indefinites articles |
Count of uses of “a” and
“an”, as a percentage of total word count |
Pronouns |
Count of pronouns, as a percentage of total
word count |
Nouns |
Count of nouns, as a percentage of total word
count |
Verbs |
Count of verbs, as a percentage of total word
count |
Determiners |
Count of determiners, as a percentage of total
word count |
Content words |
All words that are not function words (as
defined by the list of stop words in NLTK), as a percentage of total
word count |