Skip to main content
. 2024 Sep 13;35(3):1178–1217. doi: 10.1007/s40593-024-00426-w

Table 8.

Feature Types

Feature type Features
Length features

1. Nb. of words

2. Nb. of unique tokens

3. Nb. of letters

4. Nb. of sentences

5. Nb. of paragraphs

6. Nb. of syllables

Occurrence features

1. Nb. of nouns

2. Nb. of verbs

3. Nb. of adjectives

4. Nb. of conjunctions

5. Nb. of adverbs

6. Nb. of possessive pronouns

7. Nb. of unique nouns

8. Nb. of unique verbs

9. Nb. of unique adjectives

10. Nb. of unique adverbs

11. Nb. of “wh”-adverbs

12. Nb. of determiners

13. Nb. of lexical words

14. Nb. of unique lexical words

15. Nb. of foreign words

16. Nb. of stopwords

17. Nb. of formal words

18. Nb. of deictic words

19. Nb. of symbols

20. Nb. of punctuations

Error features

1. Nb. of errors

2. Nb. of grammar errors

3. Nb. of punctuation errors

4. Nb. of typos errors

5. Ratio Nb. of errors / words

6. Ratio Nb. of grammar errors / words

7. Ratio Nb. of punctuation errors / words

8. Ratio Nb. of typos errors / words

Morphological complexity

1. Nb. of comparatives

2. Nb. of superlatives

3. Nb. of finite verbs

4. Nb. of non-third person singular verb

5. Nb. of infinitive verbs

6. Ratio of comparatives

7. Ratio of superlatives

8. Ratio of finite verbs

9. Ratio of non-third person singular verb

10. Ratio of infinitive verbs

Cohesion

1. Nb. of connectors

2. Nb. of unique connectors

3. Mean noun overlap with previous sentence

4. Mean verb overlap with previous sentence

5. SD noun overlap with previous sentence

6. SD verb overlap with previous sentence

7. Ratio of connectors

8. Ratio of unique connectors

Readability

1. Flesch Score

2. Dale-Chall Score

3. Gunning-Flog Index

4. Integration Cost

5. Average nb. of sentences per 100 words

6. Average nb. of words per 100 letters

7. Words per sentences

8. Type-token ratio easy words

9. Type-token ratio easy nouns

10. Type-token ratio easy verbs

11. Type-token ratio easy adverbs

12. Type-token ratio easy adjectives

13. Integration cost

14. Heylinghen-F-Score

Lexical diversity

1. Type-token ratio

2. Type-token ratio nouns

3. Type-token ratio verbs

4. Type-token ratio adjectives

5. Type-token ratio conjunctions

6. Type-token ratio lexical words

7. Type-token ratio functional words

8. Type-token ratio deictic words

9. Type-token ratio “wh”-adverbs

10. Type-token ratio infinitive verbs

11. Global edit distance

Lexical sophistication

1. BNC easy words

2. NGSL easy words

3. SUBLEX easy words

4. BNC easy nouns

5. NGSL easy nouns

6. SUBLEX easy nouns

7. BNC easy verbs

8. NGSL easy verbs

9. SUBLEX easy verbs

10. Brown Frequencies token

11. Brown Frequencies type

12. Brown Frequencies lex. words

13. Brown Frequencies func. words

14. Thorndike Frequencies token

15. Thorndike Frequencies type

16. Thorndike Frequencies lex. words

17. Thorndike Frequencies func. words

18. MRC Frequencies token

19. MRC Frequencies type

20. MRC Frequencies lex. words

21. MRC Frequencies func. words

Syntactic complexity

1. Nb. subordinate clauses

2. Nb. fragment sentences

3. Nb. of noun phrases

4. Mean tokens before main verb

5. Nb. of complex noun phrases

6. Nb. of unknown constituents

7. Nb. of postnominal modifiers per complex noun phrase

8. Integration cost

9. Ratio subordinate clauses

10. Ratio fragment sentences

11. Ratio of noun phrases

12. SD tokens before main verb

13. Ratio of complex noun phrases

14. Ratio of unknown constituents

15. Ratio of postnominal modifiers per complex noun phrase

We additionally calculated several ratios and distribution parameters (i.e., means and standard deviations) for some of the features