| # | Candidate predictor variable | Calculation |
|---|---|---|
| 1 | Total characters | Total number of characters including spaces. |
| 2 | Total keystrokes | The minimum number of keystrokes needed to type the text accurately, assuming shift is used rather than caps lock. |
| 3 | Total words | The number of words in the text, defined as groups of characters separated by spaces, rather than the typical definition of words as five characters (when calculating speed). |
| 4 | Keystrokes per word | Total keystrokes divided by total words. |
| 5 | Characters per word | Total characters divided by total words. |
| 6 | Mean word proportion | 1 divided by characters per word. |
| 7 | Proportion of words within high-frequency words | Number of words from the text that appear in the top 1000 words list* divided by total words. |
| 8 | Proportion of characters within high-frequency words | Number of characters that are contained in words from the text that appear in the top 1000 words list* divided by total characters. |
| 9 | Mean word frequency | Sum of the language frequencies of each word in the text, divided by number of words. Frequencies from SubtLEXUS (Brysbaert & New, 2009, ‘FREQcount’ variable). |
| 10 | Proportion of non-words | Number of words in the text that are not recognised in UK, US, AU or CA Hunspell English dictionaries (according to the {hunspell} package; Ooms, 2022) divided by total words. |
| 11 | Proportion of characters within non-words | Number of characters that are contained in words that are not recognised in UK, US, AU or CA dictionaries divided by total characters. |
| 12 | Syllables per word | Total number of syllables (according to the {quanteda.textstats} package; Benoit et al., 2018), divided by total words. This package uses the CMU Pronunciation Dictionary (Carnegie Mellon University, n.d.), and counts vowel clusters for words not in this dictionary. |
| 13 | Bigram frequency | Sum of the language frequencies of each letter pair in the text, divided by number of letter pairs. Frequencies based on Behmer and Crump (2017; 'Frequency' variable). This includes letter pairs only, with no spaces, and is based on approximately 3000 English language eBooks from Project Gutenberg. |
| 14 | Proportion of high frequency bigrams | Number of letter pairs from the text that are appear in the top 15 bigrams, divided by number of letter pairs. (An alternative approach akin to proportion of high frequency words). Frequencies from Behmer and Crump (2017). |
| 15 | Proportion of character repetitions | Number of character pairs relating to character repetitions (e.g. ‘rr’, ‘..’), divided by number of character pairs. |
| 16 | Proportion of finger repetitions | Number of character pairs relating to finger repetitions (e.g. ‘ed’, ‘k,’), assuming standard touch typing, divided by number of character pairs. |
| 17 | Proportion of hand repetitions | Number of character pairs relating to hand repetitions (e.g. ‘se’, ‘hi’), assuming standard touch typing, divided by number of character pairs. |
| 18 | Proportion of hand alternations | Number of character pairs relating to character repetitions (e.g. ‘qu’, ‘ty’), assuming standard touch typing, divided by number of character pairs. |
| 19 | Proportion of lowercase letter characters | Number of lowercase letters divided by total characters. |
| 20 | Proportion of uppercase letter characters | Number of uppercase letters divided by total characters. |
| 21 | Proportion of numbers | Number of numbers divided by total characters. |
| 22 | Proportion of symbols | Number of symbols (including both punctuation and non-punctuation symbols) divided by total characters. |
| 23 | Proportion of spaces | Number of spaces divided by total characters. |
| 24 | Proportion of lowercase letter non-space characters | Number of lowercase letters divided by total non-space characters. |
| 25 | Proportion of uppercase letter non-space characters | Number of uppercase letters divided by total non-space characters. |
| 26 | Proportion of number non-space characters | Number of numbers divided by total non-space characters. |
| 27 | Proportion of symbol non-space characters | Number of symbols divided by total non-space characters. |
| 28 | Keystrokes per character | Total keystrokes divided by total characters. |
| 29 | Proportion of right-side keys | Number of characters relating to keys on the right-hand side of the keyboard, assuming standard touch typing, divided by total characters. |
| 30 | Mean distance from home row | Sum of each character’s key distance from the eight finger resting keys on the home row, divided by total characters. Distances are based on Krzywinski (n.d.). |
Predictors were calculated according to American English spellings and keyboard layout (ANSI) unless stated otherwise. *We used the 1,000 most frequent English words list from the Corpus of Contemporary American English (Davies, 2008-), including lemmatisations. For example, “do” is on the core list, so variations such as “doing,” “did,” and “done” were also considered. Including lemmatisations allows the Typability Index to capture familiarity with core concepts, not just the specific forms of words. This helps reflect both the cognitive familiarity and the ease of typing frequent or commonly recognised words