Table 8.
Feature Types
| Feature type | Features |
|---|---|
| Length features |
1. Nb. of words 2. Nb. of unique tokens 3. Nb. of letters 4. Nb. of sentences 5. Nb. of paragraphs 6. Nb. of syllables |
| Occurrence features |
1. Nb. of nouns 2. Nb. of verbs 3. Nb. of adjectives 4. Nb. of conjunctions 5. Nb. of adverbs 6. Nb. of possessive pronouns 7. Nb. of unique nouns 8. Nb. of unique verbs 9. Nb. of unique adjectives 10. Nb. of unique adverbs 11. Nb. of “wh”-adverbs 12. Nb. of determiners 13. Nb. of lexical words 14. Nb. of unique lexical words 15. Nb. of foreign words 16. Nb. of stopwords 17. Nb. of formal words 18. Nb. of deictic words 19. Nb. of symbols 20. Nb. of punctuations |
| Error features |
1. Nb. of errors 2. Nb. of grammar errors 3. Nb. of punctuation errors 4. Nb. of typos errors 5. Ratio Nb. of errors / words 6. Ratio Nb. of grammar errors / words 7. Ratio Nb. of punctuation errors / words 8. Ratio Nb. of typos errors / words |
| Morphological complexity |
1. Nb. of comparatives 2. Nb. of superlatives 3. Nb. of finite verbs 4. Nb. of non-third person singular verb 5. Nb. of infinitive verbs 6. Ratio of comparatives 7. Ratio of superlatives 8. Ratio of finite verbs 9. Ratio of non-third person singular verb 10. Ratio of infinitive verbs |
| Cohesion |
1. Nb. of connectors 2. Nb. of unique connectors 3. Mean noun overlap with previous sentence 4. Mean verb overlap with previous sentence 5. SD noun overlap with previous sentence 6. SD verb overlap with previous sentence 7. Ratio of connectors 8. Ratio of unique connectors |
| Readability |
1. Flesch Score 2. Dale-Chall Score 3. Gunning-Flog Index 4. Integration Cost 5. Average nb. of sentences per 100 words 6. Average nb. of words per 100 letters 7. Words per sentences 8. Type-token ratio easy words 9. Type-token ratio easy nouns 10. Type-token ratio easy verbs 11. Type-token ratio easy adverbs 12. Type-token ratio easy adjectives 13. Integration cost 14. Heylinghen-F-Score |
| Lexical diversity |
1. Type-token ratio 2. Type-token ratio nouns 3. Type-token ratio verbs 4. Type-token ratio adjectives 5. Type-token ratio conjunctions 6. Type-token ratio lexical words 7. Type-token ratio functional words 8. Type-token ratio deictic words 9. Type-token ratio “wh”-adverbs 10. Type-token ratio infinitive verbs 11. Global edit distance |
| Lexical sophistication |
1. BNC easy words 2. NGSL easy words 3. SUBLEX easy words 4. BNC easy nouns 5. NGSL easy nouns 6. SUBLEX easy nouns 7. BNC easy verbs 8. NGSL easy verbs 9. SUBLEX easy verbs 10. Brown Frequencies token 11. Brown Frequencies type 12. Brown Frequencies lex. words 13. Brown Frequencies func. words 14. Thorndike Frequencies token 15. Thorndike Frequencies type 16. Thorndike Frequencies lex. words 17. Thorndike Frequencies func. words 18. MRC Frequencies token 19. MRC Frequencies type 20. MRC Frequencies lex. words 21. MRC Frequencies func. words |
| Syntactic complexity |
1. Nb. subordinate clauses 2. Nb. fragment sentences 3. Nb. of noun phrases 4. Mean tokens before main verb 5. Nb. of complex noun phrases 6. Nb. of unknown constituents 7. Nb. of postnominal modifiers per complex noun phrase 8. Integration cost 9. Ratio subordinate clauses 10. Ratio fragment sentences 11. Ratio of noun phrases 12. SD tokens before main verb 13. Ratio of complex noun phrases 14. Ratio of unknown constituents 15. Ratio of postnominal modifiers per complex noun phrase |
We additionally calculated several ratios and distribution parameters (i.e., means and standard deviations) for some of the features