Skip to main content
. 2019 Aug 2;11:205. doi: 10.3389/fnagi.2019.00205

Table 2.

Language features (26 features).

Total words is the total number of words produced (excluding filled pauses, unintelligible words, and false starts) (1 feature).
Mean length of sentence (MLS) is the total number of words in the narrative divided by the number of sentences (1 feature).
Phrase type proportion is derived from work on rating the fluency of machine translations (Chae and Nenkova, 2009). The phrase type proportion is the total number of words belonging to a given phrase type (here prepositional phrases, noun phrases, and verb groups), divided by the total number of words in the narrative. We additionally extend this feature to apply to clauses; namely main finite clauses, main infinitive clauses, and subordinate clauses (6 features).
Part-of-speech ratios are computed for: the ratio of nouns to verbs, the ratio of pronouns to nouns, the ratio of determiners to nouns, and the ratio of open-class words to closed-class words (4 features).
Proportion of verbs in the present tense is computed as a proxy for discourse type as presented in Drummond et al. (2015), distinguishing between a descriptive style (mostly present tense) vs. narrative style (mostly past tense) (1 feature).
Median word frequency is estimated according to the modern Swedish section of the Korp corpus (Borin et al., 2012) (1 feature).
Type-token ratio (TTR) is calculated by dividing the number of unique word types by the total number of tokens in the narrative (1 feature).
Information unit counts are computed for each of the information unit categories listed in (Croisile et al., 1996); namely, the three subjects, eleven objects, two places, and seven actions. These counts are extracted using a keyword-spotting method with some manual correction. The total count for each category is then normalized by the total number of words in the narrative (4 features).
Content density and content efficiency are computed by counting the total number of information units mentioned (including repetitions) and dividing by the total number of words and the total time, respectively (2 features).
Propositional density is calculated by taking the ratio of propositions (verbs, adjectives, adverbs, prepositions, and conjunctions) to total number of words (Mueller et al., 2017) (1 feature).
Dysfluency marker counts are computed by counting the number of filled pauses, false starts, and incomplete sentences, each normalized by total number of words. An overall dysfluency index is also computed by summing the counts from the three categories and dividing by the total number of words (4 features).