Table 3. Most common unigrams, bigrams, and emojis without stop words, punctuation, and numbers.
Stop words were removed using NLTK (Bird, Klein & Loper, 2009). Most unigrams and bigrams can have several English translations depending on the context. The table provides only one translation option.
| Unigram | Bigram | Emoji | |||||
|---|---|---|---|---|---|---|---|
| Item | Count | Item | Count | Item | Count | ||
| Russian | English | Russian | English | ||||
| это | it | 1,117 | доброе утро | good morning | 39 |
|
443 |
| просто | simply | 355 | спокойной ночи | good night | 26 |
|
313 |
| спасибо | thanks | 306 | спасибо большое | thanks a lot | 24 |
|
246 |
| хочу | want | 249 | самом деле | actually | 23 |
|
240 |
| ещё | yet | 223 | это просто | it’s simple | 23 |
|
120 |
| почему | why | 209 | опубликовано фото | published photo | 18 |
|
119 |
| очень | very | 205 | сих пор | so far | 17 |
|
118 |
| всё | all | 204 | руб г | rub g | 16 |
|
113 |
| блять | fuck | 184 | днем рождения | birthday | 15 |
|
104 |
| вообще | generally | 174 | все ещё | still | 13 |
|
100 |