RuSentiTweet: a sentiment analysis dataset of general domain tweets in Russian

Sergey Smetanin

doi:10.7717/peerj-cs.1039

. 2022 Jul 19;8:e1039. doi: 10.7717/peerj-cs.1039

RuSentiTweet: a sentiment analysis dataset of general domain tweets in Russian

Sergey Smetanin ^1,^✉

Editor: Yilun Shang

PMCID: PMC9454938 PMID: 36092008

Abstract

The Russian language is still not as well-resourced as English, especially in the field of sentiment analysis of Twitter content. Though several sentiment analysis datasets of tweets in Russia exist, they all are either automatically annotated or manually annotated by one annotator. Thus, there is no inter-annotator agreement, or annotation may be focused on a specific domain. In this article, we present RuSentiTweet, a new sentiment analysis dataset of general domain tweets in Russian. RuSentiTweet is currently the largest in its class for Russian, with 13,392 tweets manually annotated with moderate inter-rater agreement into five classes: Positive, Neutral, Negative, Speech Act, and Skip. As a source of data, we used Twitter Stream Grab, a historical collection of tweets obtained from the general Twitter API stream, which provides a 1% sample of the public tweets. Additionally, we released a RuBERT-based sentiment classification model that achieved F₁ = 0.6594 on the test subset.

Keywords: Sentiment dataset, Sentiment analysis, Russian

Introduction

Recently, Twitter has been established as a major research platform, utilized in more than ten thousand research articles over the past ten years. Sentiment analysis has proven to be one of the major research areas (Antonakaki, Fragopoulou & Ioannidis, 2021). As expected, there is an interest in the sentiment analysis of the Russian-speaking segment of Twitter, not only for training machine learning (ML) models (Kotelnikova, 2020; Araslanov, Komotskiy & Agbozo, 2020; Kanev et al., 2022), but also for applied research—such as studying migration issues (Borodkina & Sibirev, 2019), measuring reactions to different events (Kirilenko & Stepchenkova, 2017; Kausar, Soosaimanickam & Nasar, 2021), and monitoring public sentiment (Chizhik, 2016; Smetanin, 2017). However, despite the fact that there are several datasets of tweets in Russian (Smetanin, 2020a), they are either annotated automatically (e.g., RuTweetCorp by Rubtsova (2013)) or annotated only by one annotator; thus, there is no inter-annotator agreement (e.g., Twitter Sentiment for 15 European Languages by Mozeticar & Smailović (2016)), or focused on a specific domain (e.g., SentiRuEval-2015 by Loukachevitch et al. (2015)). Thus, this research community lacks a general domain sentiment dataset of tweets in Russian that is annotated manually with reported inter-rater agreement score.

In this article, we present RuSentiTweet, a new sentiment analysis dataset of 13,392 general domain tweets in Russian. RuSentiTweet was annotated manually using RuSentiment guidelines (Rogers et al., 2018) into five classes (Positive, Neutral, Negative, Speech Act, and Skip) with moderate inter-rater agreement. The practical and academic contribution of this study is threefold. Firstly, we reviewed existing public sentiment dataset of tweets in Russian. Secondly, we filled the data gap and introduced RuSentiTweet, the only dataset of general domain tweets with manual annotation for the Russian language. Lastly, we trained several ML models to provide further research with a strong baseline.

The rest of the article is organized as follows. In “Related Work”, we review related research, identify existing public sentiment datasets of tweets in Russian, and confirm the importance of a new dataset of general domain tweets in Russian. In “Sentiment Dataset”, we describe the creation of RuSentiTweet. In “Sentiment Classification Baseline”, we document the training of several ML models to provide the research community with public baselines. In “Conclusion”, we present conclusions from this study.

Related work

As of 2022, Russian was the eighth most widely-spoken language worldwide, with a total number of 258.2 million speakers (Szmigiera, 2022). Yet as reported in the preliminary results of the All-Russian Census 2020 (Rosstat, 2022), only about 147 million people permanently live in Russia. In addition to Russia, where Russian is the official language, it is also widely spoken in a number of other countries that were part of the USSR. According to various sources (Arefiev, 2013; Lopatin & Ulukhanov, 2017), there are from 52 to 94 million native speakers of the Russian language in these countries. A large number of Russian speakers also live in other countries such as those in Europe, the USA, Canada, Israel, and others (Lopatin & Ulukhanov, 2017). Given the significant Russian-speaking population and the ever-growing level of Internet penetration, texts published by Russian-speaking users on social networks are attracting more and more attention from researchers. As a result, every year new works appear both in the classical analysis of the sentiment of Russian-language content (e.g., Araslanov, Komotskiy & Agbozo, 2020; Kanev et al., 2022; Kausar, Soosaimanickam & Nasar, 2021) and in related areas, such as the identification of emotions (e.g., Babii, Kazyulina & Malafeev, 2020; Kazyulina, Babii & Malafeev, 2020; Babii, Kazyulina & Malafeev, 2021), toxicity and hate speech detection (e.g., Zueva, Kabirova & Kalaidin, 2020; Pronoza et al., 2021; Smetanin & Komarov, 2021b), and inappropriate language identification (e.g., Babakov et al., 2021; Babakov, Logacheva & Panchenko, 2022).

However, the Russian language is not as well-resourced as the English language (Besacier et al., 2014), especially in the field of sentiment analysis (Smetanin & Komarov, 2021a), so the data options for researchers are quite limited. In our previous study (Smetanin, 2020a), we identified 14 publicly available sentiment analysis datasets of Russian texts. In said study, we considered only those datasets that can be accessed via instructions from their original papers or official websites. Following this strategy, we omitted several existing datasets—such as ROMIP datasets (Chetviorkin, Braslavskiy & Loukachevich, 2013; Chetvirokin & Loukachevitch, 2013)—because we were unable to obtain access to them. Among these 14 datasets, only six datasets were constructed based on Twitter content, so we selected them for further detailed analysis. Additionally, we analysed the most recent review of sentiment analysis datasets of Russian texts by Kotelnikov (2021) but did not find any new Twitter datasets for consideration.

As can be seen from Table 1, RuTweetCorp (Rubtsova, 2013) is the largest sentiment analysis dataset of general domain tweets in Russian, but it was automatically annotated based on the strategy proposed by Read (2005): each tweet was assigned with the sentiment class based on the emoticons it contains. As a consequence, even a simple rule-based approach based on the presence of the ‘(’ character can achieve F₁ = 97.39% in the binary (Positive and Negative classes) classification task (Smetanin & Komarov, 2021a). SemEval-2016 Task 5 Russian (Pontiki et al., 2016), SentiRuEval-2016 (Lukashevich & Rubtsova, 2016) and SentiRuEval-2015 (Loukachevitch et al., 2015) are manually annotated and widely used datasets, but they are all tied to a specific domain such as restaurants, automobiles, telecommunication companies, or banks. Twitter Sentiment for 15 European Languages (Mozeticar & Smailović, 2016) is a sentiment analysis dataset with manual annotation, but only one annotator was engaged for Russian-language tweets; thus, there is no inter-annotator agreement. The Kaggle dataset did not report data collection and annotation procedure. Thus, there is a lack of general domain sentiment dataset of tweets in Russian that is annotated manually with reported inter-rater agreement score.

Table 1. Sentiment analysis datasets of Russian language texts.

More detailed description of each datasetcan be found in Smetanin (2020a), Smetanin & Komarov (2021a), Kotelnikov (2021), as well as in original papers (if published). For datasets that contain several subsets from different data sources, we indicated only those subsets that are made from tweets.

Dataset	Data source	Domain	Annotation	Classes	Size	Link
Twitter Sentiment for 15 European Languages (Mozeticar & Smailović, 2016)	Twitter	General	Manual	3	107,773	Project page
SemEval-2016 Task 5 Russian (Pontiki et al., 2016)	Twitter	Restaurants	Manual	3	405	Project page
SentiRuEval-2016 (Lukashevich & Rubtsova, 2016)	Twitter	Telecom and banks	Manual	3	23,595	Project page
SentiRuEval-2015 (Loukachevitch et al., 2015)	Twitter	Telecom and banks	Manual	4	16,318	Project page
RuTweetCorp (Rubtsova, 2013)	Twitter	General	Automatic	3	334,836	Project page
Kaggle Russian_twitter_sentiment	Twitter	n/a	n/a	2	226,832	Kaggle page

Open in a new tab

Sentiment dataset

Data collection

For a data source of tweets in Russian, we decided to use the Twitter Stream Grab (https://archive.org/details/twitterstream), a publicly available historical collection of JSON grabbed from the general Twitter “Spritzer” API stream. According to Twitter, this API provides a 1% sample of the complete public tweets and is not tied to a specific topic, so we considered it as a good source of general domain tweets. Additionally, several studies (Wang, Callan & Zheng, 2015; Leetaru, 2019) performed independent validation of the representativeness of this stream. Since the Twitter Stream Grab consists of tweets in different languages, our first step was to remove tweets written in non-Russian languages. Each tweet from this data source already contained information about the language of the text automatically detected¹ by Twitter, so the language filtering procedure was fairly straightforward.

We downloaded the Twitter Stream Grab for 12 months from January 2020 to December 2020² . The main motivation for choosing an entire year as the interval was to cover all months of the year to minimize the effect of seasonality. Previous research has shown that there are daily (Larsen et al., 2015; Prata et al., 2016), weekly (Ten Thij, Bhulai & Kampstra, 2014; Dzogang, Lightman & Cristianini, 2017b), and seasonal (Dzogang et al., 2017a) patterns of sentiment or emotion expression on Twitter. Also, it has been found (Baylis et al., 2018; Baylis, 2020) that expressed sentiment correlates with weather, which also tends to depend on the season. After excluding retweets and filtering by language, we obtained ∼4.5M tweets in Russian. Since manual labelling of such a volume of tweets is costly and extremely time-consuming, we randomly selected 15,000 tweets for further annotation (tweets evenly distributed over the selected months).

Data annotation

Guidelines

As per recommendations outlined in our previous study (Smetanin, 2020a), we decided to use RuSentiment (Rogers et al., 2018) annotation guidelines (https://github.com/text-machine-lab/rusentiment/tree/master/Guidelines). To the best of our knowledge, this is the only set of publicly available sentiment annotation guidelines designed for the Russian language. The guidelines are described in detail in the original RuSentiment paper, so this section provides only key summary.

The annotation guidelines cover both implicit and explicit forms of expressions for external attitude (evaluation) and the internal emotional state (mood). The guidelines cover five sentiment classes.

Negative represents both explicit and implicit negative sentiment or attitude towards something.
Neutral represents texts that simply describe some situation in a neutral, matter-of-fact way and have no clear positive or negative sentiment. This class also includes commercial information, factual questions, objective descriptions, and summaries.
Positive represents both explicit and implicit positive sentiment or attitude towards something.
Speech Act represents texts that perform the functions of various speech acts—such as greeting someone, congratulating someone, and expressing gratitude for something. Although these texts also represent a positive sentiment, they are treated as a separate subcategory because they can also be performed under social pressure or out of a feeling of obligation (Rogers et al., 2018).
Skip represents noisy and unclear sentiment or attitude towards something—such as when the original meaning is impossible to ascertain without additional context, the sentiment of the texts as a whole is not entirely clear, the text is not in Russian, or the text contains jokes.

Text with irony was annotated with the dominant sentiment, commonly negative. Hashtags were treated as information units similar to basic words or phrases. Emoticons were not treated as the only sentiment labels but were analysed in combination with the whole text to identify dominant sentiment.

Crowdsourcing platform

The annotation was performed via Yandex.Toloka (https://toloka.ai/), a Russian crowd-sourcing platform with a high share of Russian speaking workers. Yandex.Toloka is widely used in the studies on Russian-language content, such as for annotation of semantic change (Rodina & Kutuzov, 2020), question answering (Korablinov & Braslavski, 2020), and toxic comments (Smetanin, 2020b). A depiction of the Yandex.Toloka user interface can be found in Fig. 1. We required annotators to pass training before starting annotation. During the annotation of the dataset, their work was continuously evaluated through honeypots. As training samples and control pairs, we selected texts from RuSentiment, which was annotated using the same guidelines. The threshold was 60% correctly annotated samples for training and 80% samples for honeypots. We selected only Russian speaking annotators who passed an internal exam (https://toloka.ai/ru/docs/guide/concepts/filters.html) on language knowledge.

The green block with quotation marks contains the text of the tweet. Under the block with the text, there are numbered sentiment classes, where 1 is *Negative*, 2 is *Neutral*, 3 is *Positive*, 4 is *Speech Act*, and 5 is *Skip*. Numbers are used as hotkeys during annotation.

Aggregation

Following the RuSentiment aggregation strategy, a tweet was deemed to belong to a class if at least two out of the three annotators attributed it to that class. In case all three annotators disagreed, the tweet was removed from the dataset as extremely noisy and unclear (see examples in Table A1). Out of the initially selected 15,000 tweets, 1,608 tweets received all three different annotations, so we excluded these tweets from the final dataset. Thus, the final dataset consists of 13,392 tweets with the following class distribution: 3,298 (24.62%) Negative tweets, 5,341 (39.88%) Neutral tweets, 2,414 (18.02%) Positive tweets, 1,843 (13.76%) Skip tweets, and 496 (3.70%) Speech Act tweets. We split our dataset into training (80%) and test subset (20%) using stratified random sampling by class labels.

Table A1. Examples of tweets with no agreement between annotators.

Tweet		Annotation
Russian	English	Annotator 1	Annotator 2	Annotator 3
ю ноу блин	yu know damn it	Skip	Neutral	Negative
@USER доброе утро всем дэдди сегодня	@USER good morning daddies to everyone today	Speech	Positive	Skip
Путешествуем по Уэльсу. не уважать мужчин	Traveling in Wales. disrespect men	Negative	Neutral	Skip
тот факт что в энимал кроссинге так мало прикольных мышиных жителей	the fact that there are so few funny mouse inhabitants in animal crossing	Negative	Positive	Neutral
Кто в нижнем родился, в верхнем не сгодился.	Who was born in the bottom, did not fit in the top.	Negative	Neutral	Skip

Open in a new tab

Inter-annotator agreement

For measuring the inter-annotator agreement, we calculated the Krippendorff’s α coefficient (Krippendorff, 1980) because it applies to any number of annotators and categories, as well as to missing or incomplete data (Krippendorff, 2004). For most inter-annotator agreement indices, including Krippendorff’s α, it is commonly suggested that a cutoff threshold value of 0.8 is a marker of good reliability, with a range of 0.667 to 0.8 allowing for tentative conclusions and values below 0.667 indicating poor agreement (Beckler et al., 2018). However, in the systematic review of crowd-sourced annotation in social computing, Salminen et al. (2018) reported that agreement scores in social computing studies are not high, averaging at around 0.60 for both Kappa and Alpha metrics, which is lower than typical threshold values. The authors highlighted that the nature of annotation in social computing tends to be more subjective rather than objective, and the more subjective the task, the worse the agreement, regardless of the metric used. Though it is important to report inter-rater agreement scores, there are suggestions that the results can be misleading in social computing (Hillaire, 2021). In fact, low agreement in this case does not necessarily mean the opinions of annotators are incorrect; it may simply indicate that they have different opinions (Salminen et al., 2018; Hillaire et al., 2021). Sentiment annotation, by nature, is a subjective task because the annotator must subjectively (with some guidelines) identify sentiment and emotions expressed by the author and not just objectively analyse narrated events or situations: we can expect annotators to have different subjective understanding of emotion expressed in a particular text. Thus, considering that in the field of social computing science the mean score is 0.60 (Salminen et al., 2018), we followed the same approach as Hillaire (2021) and adopted the less conservative interpretation of inter-rater agreement by Landis & Koch (1977), which suggests the following interpretations.

Scores from 0.0 to 0.2 indicate a slight agreement.
Scores from 0.21 to 0.40 indicate a fair agreement.
Scores from 0.41 to 0.60 indicate a moderate agreement.
Scores from 0.61 to 0.80 indicate a substantial agreement.
Scores from 0.81 to 1.0 indicate almost perfect or perfect agreement.

We calculated Krippendorff’s α with binary distance (e.g., all classes have similar distance between each other) using the NLTK library (Bird, Klein & Loper, 2009) and obtained the score of 0.5048 for binary distance, which can be interpreted as a moderate agreement between annotators. We considered this level of agreement as satisfactory for our case, since other five-class sentiment datasets also reported this or even lower level of agreement, such as Blog Track at TREC 2008 (α = 0.4219, five classes) (Bermingham & Smeaton, 2009), LINIS Crowd (α = 0.541, five classes) (Koltsova, Alexeeva & Kolcov, 2016), RuSentiment (Fleiss’ kappa of 0.58, five classes) (Rogers et al., 2018), sentiment@USNavy (α = 0.592, four classes) (Fiok et al., 2021), and NaijaSenti (Fleiss kappa of (0.434, 0.555), five classes) (Muhammad et al., 2022). Additionally, we calculated Krippendorff’s α with interval distance that takes into account distance between classes: for example, Neutral and Positive classes are closer to each other than Negative and Positive classes. The distance matrix is presented in Table 2. The Krippendorff’s α coefficient for interval distance was 0.5601, which can also be interpreted as slightly higher but still moderate agreement.

Table 2. Distance between classes for interval Krippendorff’s $α$ , where 0 means that classes are the same, 1 means that classes are close to each other, and 2 means that classes a far away from each other.

Class	Negative	Neutral	Positive	Speech	Skip
Negative	0	1	2	2	1
Neutral	1	0	1	1	1
Positive	2	1	0	0	1
Speech	2	1	0	0	1
Skip	1	1	1	1	0

Open in a new tab

Note:

Positive and Speech classes have zero distance between them; they both represent positive sentiment as per RuSentiment guidelines.

Explanatory analysis

The average text length is 59.36 characters for all text, 67.52 for Negative, 59.29 for Neutral, 57.85 for Positive, 42.71 for Speech, and 51.41 for Skip. As can be seen from Fig. 2, the frequency of occurrence of texts from a pair of characters in the dataset is extremely low, but with an increase in the number of characters, rapid growth begins. The frequency peak is reached when the text length is from 20 to 40 characters, and then the frequency gradually begins to decrease. Interestingly, for some classes, there is a moderate Pearson’s correlation between the length of the text and the proportion of texts with this class relative to all texts. The Negative class has a moderate positive correlation (ρ = 0.68, p < 0.01) with the length of text, whereas Speech (ρ = −0.52, p < 0.01) and Skip (ρ = −0.62, p < 0.01) classes have moderate negative correlation. At the same time, Neutral (ρ = −0.03, p = 0.70) and Positive (ρ = −0.04, p = 0.62) classes do not have statistically significant correlation. The most common unigrams, bigrams, and emojis can be found in Table 3.

Table 3. Most common unigrams, bigrams, and emojis without stop words, punctuation, and numbers.

Stop words were removed using NLTK (Bird, Klein & Loper, 2009). Most unigrams and bigrams can have several English translations depending on the context. The table provides only one translation option.

Unigram			Bigram			Emoji
Item		Count	Item		Count	Item	Count
Russian	English	Count	Russian	English	Count	Item	Count
это	it	1,117	доброе утро	good morning	39		443
просто	simply	355	спокойной ночи	good night	26		313
спасибо	thanks	306	спасибо большое	thanks a lot	24		246
хочу	want	249	самом деле	actually	23		240
ещё	yet	223	это просто	it’s simple	23		120
почему	why	209	опубликовано фото	published photo	18		119
очень	very	205	сих пор	so far	17		118
всё	all	204	руб г	rub g	16		113
блять	fuck	184	днем рождения	birthday	15		104
вообще	generally	174	все ещё	still	13		100

Open in a new tab

As mentioned in “Related Work”, one of the key limitations of RuTweetCorp (Rubtsova, 2013)—the biggest automatically annotated dataset of tweets in Russian—is that Positive and Negative tweets in it can be easily separated with F₁ = 97.39% by a simple rule-based approach based on the presence of the ‘(’ character. We decided to check that this limitation does not apply to RuSentiTweet. We applied this simple rule-based approach to Positive and Negative tweets from RuSentiTweet and got F₁ = 0.3450 (i.e., approximately the same result as in the case of a random classification), thereby confirming that RuTweetCorp’s limitation does not apply to RuSentiTweet.

Sentiment classification baseline

Model selection

As was demonstrated in our recent study (Smetanin & Komarov, 2021a), sentiment analysis of the Russian language text based on the language models tends to outperform rule-based and basic ML-based approaches in terms of classification quality. This statement was also supported by other studies (Golubev & Loukachevitch, 2020; Kotelnikova, 2020; Konstantinov, Moshkin & Yarushkina, 2021). Based on the mentioned papers, we decided to fine-tune RuBERT (Kuratov & Arkhipov, 2019), a version of BERT (Devlin et al., 2019) trained on the Russian part of Wikipedia and Russian news. Over the past few years, this model has been actively used in sentiment analysis studies on the Russian language and constantly demonstrated strong or even new state-of-the-art (SOTA) results (Golubev & Loukachevitch, 2020; Kotelnikova, 2020; Konstantinov, Moshkin & Yarushkina, 2021; Smetanin & Komarov, 2021a). For comparison, we also decided to train more a classical ML classifier for sentiment analysis task: Multinomial Naive Bayes (MNB). We used the MNB implementation (https://github.com/sismetanin/sentiment-analysis-of-tweets-in-russian) from our previous paper (Smetanin & Komarov, 2019).

Results

During the training stage for RuBERT, we relied on the approach used in Smetanin & Komarov (2021a). Fine-tuning was performed using the Transformers library (Wolf et al., 2020) on 1 Tesla V100 SXM2 32GB GPU with the following parameters: four train epochs, 128 max sequence length, 32 batch size, and a learning rate of 5e−5. Since our goal was to provide a baseline classification model and not the most efficient one, we did not search for the most efficient training parameters. We repeated each experiment 3 times and reported mean values of the measurements. For MNB, we used the same parameters as in our previous paper (Smetanin & Komarov, 2019): combination of unigrams and bigrams, TF-IDF vectorizer, and an alpha of 0.01.

According to the results presented in Table 4, RuBERT outperformed MNB, as expected, and showed the best classification scores. The classification results obtained on RuSentiTweet are slightly lower but still comparable with the results obtained in other studies on RuSentiment (see Table 5): RuBERT achieved $F_{1}^{w e i g h t e d} = 0.7263$ on RuSentiment (Kuratov & Arkhipov, 2019), whereas on our dataset this model showed $F_{1}^{w e i g h t e d} = 0.6675$ . The difference in the results could be caused by the size of the dataset because RuSentiment is more than two times bigger. The classification metrics of five-class sentiment analysis approaches on other datasets in other languages can be found in Table 5. Although direct comparison for different datasets and languages may not be entirely correct, we can see that at least the magnitude of order of our approach corresponds with the average score for five-class classification.

Table 4. Five-class sentiment classification on RuSentiTweet.

Model	Precision	Recall	$F_{1}^{m a c r o}$	$F_{1}^{w e i g h t e d}$
RuBERT	0.6793	0.6449	0.6594	0.6675
MNB	0.5867	0.5021	0.5216	0.5189

Open in a new tab

Table 5. Five-class sentiment classification studies.

Study	Dataset	Model	Classification metrics
Study	Dataset	Model	Accuracy	Precision	Recall	$F_{1}^{m a c r o}$	$F_{1}^{w e i g h t e d}$
Muhammad et al. (2022)	NaijaSenti	XLM-R-base+LAFT	n/a	n/a	n/a	n/a	0.795
Muhammad et al. (2022)	NaijaSenti	M-BERT+LAFT	n/a	n/a	n/a	n/a	0.7700
Fiok et al. (2021)	sentiment@USNavy	BART large + CNN	n/a	n/a	n/a	0.596	n/a
Smetanin & Komarov (2021a)	RuSentiment	M-BERT-Base	n/a	0.6722	0.6907	0.6794	0.7244
Smetanin & Komarov (2021a)	RuSentiment	RuBERT	n/a	0.7089	0.7362	0.7203	0.7571
Smetanin & Komarov (2021a)	RuSentiment	M-USE-CNN	n/a	0.6571	0.6708	0.6627	0.7105
Smetanin & Komarov (2021a)	RuSentiment	M-USE-Trans	n/a	0.6821	0.6982	0.6860	0.7342
Jamadi Khiabani, Basiri & Rastegari (2020)	TripAdvisor	Dempster–Shafer-based model	0.79	0.5	0.47	0.49	n/a
Jamadi Khiabani, Basiri & Rastegari (2020)	CitySearch	Dempster–Shafer-based model	0.79	0.48	0.48	0.48	n/a
Kuratov & Arkhipov (2019)	RuSentiment	Multilingual BERT	n/a	n/a	n/a	n/a	0.7082
Kuratov & Arkhipov (2019)	RuSentiment	RuBERT	n/a	n/a	n/a	n/a	0.7263
Baymurzina, Kuznetsov & Burtsev (2019)	RuSentiment	SWCNN + fastText Twitter	n/a	n/a	n/a	n/a	0.7850
Baymurzina, Kuznetsov & Burtsev (2019)	RuSentiment	BiGRU + ELMo Wiki	n/a	n/a	n/a	n/a	0.6947
Tripto & Ali (2018)	YouTube	LSTM	0.5424	n/a	n/a	0.5320	n/a
Li et al. (2018)	Twitter	Logistic Regression	0.6899	0.6053	0.6899	0.6354	n/a
Ahmadi et al. (2017)	SST-5	RNTN	0.41	n/a	n/a	0.32	n/a
Buntoro, Adji & Purnamasari (2016)	Twitter	Naïve Bayes	0.7177	0.716	0.718	n/a	n/a
Aly & Atiya (2013)	LABR	SVM	0.503	n/a	n/a	n/a	0.491
Chetvirokin & Loukachevitch (2013)	ROMIP-2012 (Movies)	n/a	0.407	n/a	n/a	0.377	n/a
Blinov, Kotelnikov & Pestov (2013)	ROMIP-2012 (Books)	SVM	0.481	0.339	0.496	0.402	n/a
Chetvirokin & Loukachevitch (2013)	ROMIP-2012 (Cameras)	n/a	0.480	n/a	n/a	0.336	n/a
Pak & Paroubek (2012)	ROMIP-2011 (Movies)	SVM	0.599	n/a	n/a	0.286	n/a
Pak & Paroubek (2012)	ROMIP-2011 (Books)	SVM	0.622	n/a	n/a	0.291	n/a
Pak & Paroubek (2012)	ROMIP-2011 (Cameras)	SVM	0.626	n/a	n/a	0.342	n/a

Open in a new tab

Note:

We selected only those studies, which consideredfive sentiment classes and reported at least one of the following classification measures: Precision, Recall, macro F1, weighted F1. Among all datasets, only ROMIP (Chetviorkin, Braslavskiy & Loukachevich, 2013; Chetvirokin & Loukachevitch, 2013) and RuSentiment (Rogers et al., 2018) datasets are in Russian.

We made our RuBERT-based model publicly available (https://huggingface.co/sismetanin/rubert-rusentitweet) to the research community.

Error analysis

Considering that RuBERT clearly outperformed MNB, we performed error analysis only for RuBERT. As can be seen from confusion matrix for RuBERT (see Fig. 3), the Skip class was one of the most scarcely classified classes since it initially consisted of barely interpretable and noisy tweets. The Speech Acts class was clearly distinguished from Negative and Neutral classes because it consists of a well-defined group of speech constructs, but it was commonly misclassified as Positive because it also represents positive sentiment. Predictably, the Neutral class was commonly misclassified as Positive or Negative class because neutral sentiment is logically located between positive and negative sentiment. As was highlighted by Barnes, Øvrelid & Velldal (2019), the issue of neutral sentiment misclassification tends to be a general challenge of non-binary sentiment classification. In general, misclassification errors of our model were quite similar to RuSentiment misclassification errors reported in our previous study (Smetanin & Komarov, 2021a) (see Fig. 4), most likely because the same annotation guidelines and models were used. The most noticeable difference was in the recall for the Speech class. For RuSentiment, it was much better separated from other classes, with recall in the interval from 0.88 to 0.96 (Smetanin & Komarov, 2021a). We suppose that the reason of such a difference is in the number of texts in this class: RuSentiment contains 3,467 texts of the Speech, whereas RuSentiTweet contains only 480 such texts. The examples of misclassified tweets can be found in Table A2.

Table A2. Examples of tweets classification.

All usernames and URLs were replaced with keywords for anonymity purposes.

Tweet		True class	Predicted class
Russian	English	True class	Predicted class
@USERNAME @USERNAME @USERNAME @USERNAME @USERNAME Помедорус	@USERNAME @USERNAME @USERNAME @USERNAME @USERNAME Pomedorus	Skip	Skip
@USERNAME ты не лохушка ЛОЛ я тебе завидую…. у меня травма из за интернета вот я лохушка	@USERNAME you’re not a sucker LOL I envy you…. I’m traumatized because of the internet I’m a sucker	Skip	Negative
@USERNAME Котиков Одриосолу Дождь	@USERNAME Cats Odriosolu Rain	Skip	Neutral
@USERNAME Уж лучше твоя грудь	@USERNAME Your breasts are better	Skip	Positive
Как сережки URL	How do you like the earrings URL	Neutral	Neutral
@USERNAME Реквием по мечте	@USERNAME Requiem for a dream	Neutral	Positive
@USERNAME ПОДОЖДИ НУ МНЕ КАЗАЛОСЬ ДА	@USERNAME WAIT I THINK YES	Neutral	Negative
@USERNAME Спокойной ночи и сладких снов	@USERNAME Good night and sweet dreams	Speech	Speech
@USERNAME как дела моя хорошая?? ( )	@USERNAME how are you my dear?? ( )	Speech	Positive
@USERNAME Это классно что у тебя есть эти люди	@USERNAME It’s great that you have these people	Positive	Positive
На самом деле я ловлю уруру с этого облачка.	In fact, I catch ururu from this cloud.	Positive	Neutral
@USERNAME Это классно что у тебя есть эти люди	What kind of morons are you, many have a school/work day tomorrow	Negative	Negative
интересный факт: смысла в клипах тхт больше, чем в твоей жизни	Interesting fact: there is more sense in txt clips than in your life	Negative	Neutral

Open in a new tab

Conclusion

In this article, we present RuSentiTweet, a new general domain sentiment dataset of tweets in Russian with manual annotation. RuSentiTweet includes 13,392 tweets annotated by three annotators with moderate inter-rater agreement into five classes: Positive, Neutral, Negative, Speech Act, and Skip. Currently, RuSentiTweet is the only dataset of general domain tweets in Russian with manual annotation by more than one annotator and is the largest in its class for Russian. Additionally, we presented a RuBERT-based model trained on RuSentiTweet, which demonstrated F₁ = 0.6594 in five-class classification. The code, data, and model were made publicly available to the research community.

Further research might focus on several areas. Firstly, considerably more work must be done to determine the most efficient ML algorithm in terms of classification quality for RuSentiTweet. In particular, it could be interesting to apply explainable sentiment analysis approaches (e.g., Szczepański et al., 2021; Kumar & Raman, 2022) to allow a deeper understanding of the reasons for misclassification errors on particular texts. Secondly, it would be interesting to measure a subjective well-being index based on historical Russian tweets. Lastly, another possible area of future research would be to perform additional toxicity annotation of negative tweets from RuSentiTweet.

Supplemental Information

Supplemental Information 1. Ref Heart Emoji.

Click here for additional data file.^{(19.3KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-1

Supplemental Information 2. Text Length Distribution.

Click here for additional data file.^{(39.9KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-2

Supplemental Information 3. Smiling Face with Heart Eyes Emoji.

Click here for additional data file.^{(27.8KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-3

Supplemental Information 4. Pleading Face Emoji.

Click here for additional data file.^{(27.3KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-4

Supplemental Information 5. Growing Heart Emoji.

Click here for additional data file.^{(26.7KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-5

Supplemental Information 6. Purple Heart Emoji.

Click here for additional data file.^{(15.4KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-6

Supplemental Information 7. Rolling on the Floor Laughing Emoji.

Click here for additional data file.^{(29.2KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-7

Supplemental Information 8. Loudly Crying Face Emoji.

Click here for additional data file.^{(27KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-8

Supplemental Information 9. Face With Tears of Joy Emoji.

Click here for additional data file.^{(29.2KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-9

Supplemental Information 10. Two Hearts Emoji.

Click here for additional data file.^{(15.3KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-10

Supplemental Information 11. Pensive Face Emoji.

Click here for additional data file.^{(25KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-11

Acknowledgments

We thank the authors of the Twitter Stream Grab for collecting public tweets and making them available to the research community. We also thank the authors of RuSentiment for making their annotation guidelines publicly available.

This research was supported in part through computational resources of HPC facilities at HSE University (Kostenetskiy, Chulkevich & Kozyrev, 2021).

Funding Statement

There was no external funding received for this study.

Footnotes

Assessing the quality of a given algorithm lies outside the scope of this study. Initial research in this direction has already been done in other studies; for example, Pavliy and Lewis (2016) compared the quality of Twitter’s language detection algorithm and Google’s Compact Language Detector on Ukrainian and Russian tweets. The authors found that Twitter’s algorithm correctly detects 92% of texts in Russian and has higher accuracy than Google’s Compact Language Detector.

At the time of this writing, all months for 2021 were not available.

Appendix

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Sergey Smetanin conceived and designed the experiments, performed the experiments, analyzed the data, performed the computation work, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The code and dataset are available at GitHub: https://github.com/sismetanin/rusentitweet.

The sentiment classification model is available at HuggingFace: https://huggingface.co/sismetanin/rubert-rusentitweet.

References

Ahmadi et al. (2017).Ahmadi Z, Skowron M, Stier A, Kramer S. An in-depth experimental comparison of RNTNs and CNNs for sentence modeling. International Conference on Discovery Science; Springer; 2017. pp. 144–152. [Google Scholar]
Aly & Atiya (2013).Aly M, Atiya A. LABR: a large scale Arabic book reviews dataset. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Short Papers); Association for Computational Linguistics; 2013. pp. 494–498. [Google Scholar]
Antonakaki, Fragopoulou & Ioannidis (2021).Antonakaki D, Fragopoulou P, Ioannidis S. A survey of Twitter research: data model, graph structure, sentiment analysis and attacks. Expert Systems with Applications. 2021;164:114006. doi: 10.1016/j.eswa.2020.114006. [DOI] [Google Scholar]
Araslanov, Komotskiy & Agbozo (2020).Araslanov E, Komotskiy E, Agbozo E. Assessing the impact of text preprocessing in sentiment analysis of short social network messages in the Russian language. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI); Piscataway: IEEE; 2020. pp. 1–4. [Google Scholar]
Arefiev (2013).Arefiev A. Demographic changes are not good for the Russian language. Demoscope Weekly. 2013:571–572. [Google Scholar]
Babakov et al. (2021).Babakov N, Logacheva V, Kozlova O, Semenov N, Panchenko A. Detecting inappropriate messages on sensitive topics that could harm a company’s reputation. Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing; Kiyv, Ukraine: Association for Computational Linguistics; 2021. pp. 26–36. [Google Scholar]
Babakov, Logacheva & Panchenko (2022).Babakov N, Logacheva V, Panchenko A. Beyond plain toxic: detection of inappropriate statements on flammable topics for the Russian language. 2022. https://arxiv.org/abs/2203.02392 https://arxiv.org/abs/2203.02392
Babii, Kazyulina & Malafeev (2020).Babii A, Kazyulina M, Malafeev AY. Automatic emotion identification in Russian text messages. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2020; Russian State University for the Humanities; 2020. pp. 1002–1010. [Google Scholar]
Babii, Kazyulina & Malafeev (2021).Babii A, Kazyulina M, Malafeev A. FastText-based methods for emotion identification in Russian internet discourse. WebSci ’21: 13th ACM Web Science Conference 2021; 2021. pp. 112–119. [Google Scholar]
Barnes, Øvrelid & Velldal (2019).Barnes J, Øvrelid L, Velldal E. Sentiment analysis is not solved! assessing and probing sentiment classification. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP; Florence, Italy: Association for Computational Linguistics; 2019. pp. 12–23. [Google Scholar]
Baylis (2020).Baylis P. Temperature and temperament: evidence from Twitter. Journal of Public Economics. 2020;184:104161. doi: 10.1016/j.jpubeco.2020.104161. [DOI] [Google Scholar]
Baylis et al. (2018).Baylis P, Obradovich N, Kryvasheyeu Y, Chen H, Coviello L, Moro E, Cebrian M, Fowler JH. Weather impacts expressed sentiment. PLOS ONE. 2018;13(4):e0195750. doi: 10.1371/journal.pone.0195750. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baymurzina, Kuznetsov & Burtsev (2019).Baymurzina D, Kuznetsov D, Burtsev M. Language model embeddings improve sentiment analysis in Russian. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019”; 2019. pp. 53–63. [Google Scholar]
Beckler et al. (2018).Beckler DT, Thumser ZC, Schofield JS, Marasco PD. Reliability in evaluator-based tests: using simulation-constructed models to determine contextually relevant agreement thresholds. BMC Medical Research Methodology. 2018;18(1):1–12. doi: 10.1186/s12874-018-0606-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bermingham & Smeaton (2009).Bermingham A, Smeaton AF. A study of inter-annotator agreement for opinion retrieval. SIGIR ’09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval; 2009. pp. 784–785. [Google Scholar]
Besacier et al. (2014).Besacier L, Barnard E, Karpov A, Schultz T. Automatic speech recognition for under-resourced languages: a survey. Speech Communication. 2014;56:85–100. doi: 10.1016/j.specom.2013.07.008. [DOI] [Google Scholar]
Bird, Klein & Loper (2009).Bird S, Klein E, Loper E. Natural language processing with Python: analyzing text with the natural language toolkit. Newton: O’Reilly Media, Inc; 2009. [Google Scholar]
Blinov, Kotelnikov & Pestov (2013).Blinov P, Kotelnikov E, Pestov O. Research of lexical approach and machine learning methods for sentiment analysis. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2013; Russian State University for the Humanities; 2013. pp. 51–61. [Google Scholar]
Borodkina & Sibirev (2019).Borodkina O, Sibirev V. Migration issues in Russian Twitter: attitudes to migrants, social problems and online resources. In: El Yacoubi S, Bagnoli F, Pacini G, editors. Internet Science. Cham: Springer International Publishing; 2019. pp. 32–46. [Google Scholar]
Buntoro, Adji & Purnamasari (2016).Buntoro GA, Adji TB, Purnamasari AE. Sentiment analysis candidates of Indonesian Presiden 2014 with five class attribute. International Journal of Computer Applications. 2016;136(2):23–29. doi: 10.5120/ijca2016908288. [DOI] [Google Scholar]
Chetviorkin, Braslavskiy & Loukachevich (2013).Chetviorkin I, Braslavskiy P, Loukachevich N. Sentiment analysis track at ROMIP 2011. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2012; Russian State University for the Humanities; 2013. pp. 1–14. [Google Scholar]
Chetvirokin & Loukachevitch (2013).Chetvirokin I, Loukachevitch N. Sentiment analysis track at ROMIP 2012. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2013; 2013. p. 2. [Google Scholar]
Chizhik (2016).Chizhik A. Factors for forming social mood on the basis of the analysis of the emotional coloring of posts in the Russian-language Twitter. New Information Technologies in Automated Systems; HSE Tikhonov Moscow Institute of Electronics and Mathematics; 2016. pp. 61–64. [Google Scholar]
Devlin et al. (2019).Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Long and Short Papers; Minneapolis, Minnesota: Association for Computational Linguistics; 2019. pp. 4171–4186. [Google Scholar]
Dzogang et al. (2017a).Dzogang F, Goulding J, Lightman S, Cristianini N. Seasonal variation in collective mood via Twitter content and medical purchases. International Symposium on Intelligent Data Analysis; Springer; 2017a. pp. 63–74. [Google Scholar]
Dzogang, Lightman & Cristianini (2017b).Dzogang F, Lightman S, Cristianini N. Circadian mood variations in Twitter content. Brain and Neuroscience Advances. 2017b;1:2398212817744501. doi: 10.1177/2398212817744501. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fiok et al. (2021).Fiok K, Karwowski W, Gutierrez E, Wilamowski M. Analysis of sentiment in tweets addressed to a single domain-specific Twitter account: comparison of model performance and explainability of predictions. Expert Systems with Applications. 2021;186:115771. doi: 10.1016/j.eswa.2021.115771. [DOI] [Google Scholar]
Golubev & Loukachevitch (2020).Golubev A, Loukachevitch N. Improving results on Russian sentiment datasets. In: Filchenkov A, Kauttonen J, Pivovarova L, editors. Artificial Intelligence and Natural Language. Cham: Springer International Publishing; 2020. pp. 109–121. [Google Scholar]
Hillaire (2021).Hillaire GE. Understanding emotions in online learning: using emotional design and emotional measurement to unpack complex emotions during collaborative learning. Milton Keynes, UK: Open University; 2021. [Google Scholar]
Hillaire et al. (2021).Hillaire G, Rienties B, Fenton-O’Creevy M, Zdrahal Z, Tempelaar D. Open World Learning: Research, Innovation and the Challenges of High-Quality Education. London: Routledge; 2021. Incorporating student opinion into opinion mining; pp. 171–185. [DOI] [Google Scholar]
Jamadi Khiabani, Basiri & Rastegari (2020).Jamadi Khiabani P, Basiri ME, Rastegari H. An improved evidence-based aggregation method for sentiment analysis. Journal of Information Science. 2020;46(3):340–360. doi: 10.1177/0165551519837187. [DOI] [Google Scholar]
Kanev et al. (2022).Kanev AI, Savchenko GA, Grishin IA, Vasiliev DA, Duma EM. Sentiment analysis of multilingual texts using machine learning methods. 2022 Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus); Piscataway: IEEE; 2022. pp. 326–331. [Google Scholar]
Kausar, Soosaimanickam & Nasar (2021).Kausar MA, Soosaimanickam A, Nasar M. Public sentiment analysis on Twitter data during COVID-19 outbreak. International Journal of Advanced Computer Science and Applications. 2021;12(2):415–422. doi: 10.14569/issn.2156-5570. [DOI] [Google Scholar]
Kazyulina, Babii & Malafeev (2020).Kazyulina M, Babii A, Malafeev A. Emotion classification in Russian: feature engineering and analysis. International Conference on Analysis of Images, Social Networks and Texts; Springer; 2020. pp. 135–148. [Google Scholar]
Kirilenko & Stepchenkova (2017).Kirilenko AP, Stepchenkova SO. Sochi 2014 Olympics on Twitter: perspectives of hosts and guests. Tourism Management. 2017;63:54–65. doi: 10.1016/j.tourman.2017.06.007. [DOI] [Google Scholar]
Koltsova, Alexeeva & Kolcov (2016).Koltsova OY, Alexeeva S, Kolcov S. An opinion word lexicon and a training dataset for Russian sentiment analysis of social media. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2016; Russian State University for the Humanities; 2016. pp. 277–287. [Google Scholar]
Konstantinov, Moshkin & Yarushkina (2021).Konstantinov A, Moshkin V, Yarushkina N. Approach to the use of language models BERT and Word2Vec in sentiment analysis of social network texts. Recent Research in Control Engineering and Decision Making; Cham: Springer International Publishing; 2021. pp. 462–473. [Google Scholar]
Korablinov & Braslavski (2020).Korablinov V, Braslavski P. RUBQ: a Russian dataset for question answering over wikidata. International Semantic Web Conference; Cham: Springer; 2020. pp. 97–110. [Google Scholar]
Kostenetskiy, Chulkevich & Kozyrev (2021).Kostenetskiy P, Chulkevich R, Kozyrev V. Hpc resources of the higher school of economics. Journal of Physics: Conference Series. 2021;1740:12050. doi: 10.1088/1742-6596/1740/1/012050. IOP Publishing. [DOI] [Google Scholar]
Kotelnikov (2021).Kotelnikov E. Current landscape of the Russian sentiment corpora. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2021; Russian State University for the Humanities; 2021. pp. 433–444. [Google Scholar]
Kotelnikova (2020).Kotelnikova A. Comparison of deep learning and rule-based method for the sentiment analysis task. 2020 International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon); Piscataway: IEEE; 2020. pp. 1–6. [Google Scholar]
Krippendorff (1980).Krippendorff K. Content analysis: an introduction to its methodology. Thousand Oaks: SAGE; 1980. [Google Scholar]
Krippendorff (2004).Krippendorff K. Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research. 2004;30(3):411–433. doi: 10.1111/j.1468-2958.2004.tb00738.x. [DOI] [Google Scholar]
Kumar & Raman (2022).Kumar P, Raman B. A BERT based dual-channel explainable text emotion recognition system. Neural Networks. 2022;150:392–407. doi: 10.1016/j.neunet.2022.03.017. [DOI] [PubMed] [Google Scholar]
Kuratov & Arkhipov (2019).Kuratov Y, Arkhipov M. Adaptation of deep bidirectional multilingual transformers for Russian language. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2019; Russian State University for the Humanities; 2019. pp. 333–340. [Google Scholar]
Landis & Koch (1977).Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174. doi: 10.2307/2529310. [DOI] [PubMed] [Google Scholar]
Larsen et al. (2015).Larsen ME, Boonstra TW, Batterham PJ, O’Dea B, Paris C, Christensen H. We Feel: mapping emotion on Twitter. IEEE Journal of Biomedical and Health Informatics. 2015;19(4):1246–1252. doi: 10.1109/JBHI.2015.2403839. [DOI] [PubMed] [Google Scholar]
Leetaru (2019).Leetaru K. Is Twitter’s spritzer stream really a nearly perfect 1% sample of its firehose? Forbes. 2019 [Google Scholar]
Li et al. (2018).Li M, Ch’ng E, Chong AYL, See S. Multi-class Twitter sentiment classification with emojis. Industrial Management & Data Systems. 2018;118(9):1804–1820. doi: 10.1108/IMDS-12-2017-0582. [DOI] [Google Scholar]
Lopatin & Ulukhanov (2017).Lopatin V, Ulukhanov I. Russian language. Languages of the World. 2017:276–540. [Google Scholar]
Loukachevitch et al. (2015).Loukachevitch N, Blinov P, Kotelnikov E, Rubtsova Y, Ivanov V, Tutubalina E. SentiRuEval: testing object-oriented sentiment analysis systems in Russian. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2015; Russian State University for the Humanities; 2015. pp. 3–13. [Google Scholar]
Lukashevich & Rubtsova (2016).Lukashevich N, Rubtsova YV. Sentirueval-2016: overcoming time gap and data sparsity in tweet sentiment analysis. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2016; Russian State University for the Humanities; 2016. pp. 416–426. [Google Scholar]
Mozeticar & Smailović (2016).Mozeticar M, Smailović J. Multilingual Twitter sentiment classification: the role of human annotators. PLOS ONE. 2016;11(5):e0155036. doi: 10.1371/journal.pone.0155036. [DOI] [PMC free article] [PubMed] [Google Scholar]
Muhammad et al. (2022).Muhammad SH, Adelani DI, Ahmad IS, Abdulmumin I, Bello BS, Choudhury M, Emezue CC, Aremu A, Abdul S, Brazdil P. NaijaSenti: a Nigerian Twitter sentiment corpus for multilingual sentiment analysis. 2022. https://arxiv.org/abs/2201.08277 https://arxiv.org/abs/2201.08277
Pak & Paroubek (2012).Pak A, Paroubek P. Language independent approach to sentiment analysis (LIMSI participation in ROMIP’11). Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2012; 2012. pp. 37–50. [Google Scholar]
Pavliy and Lewis (2016).Pavliy B, Lewis J. The performance of Twitter’s language detection algorithm and Google’s Compact Language Detector on language detection in Ukrainian and Russian tweets. Bulletin of Toyama University of International Studies. 2016;8:99–106. [Google Scholar]
Pontiki et al. (2016).Pontiki M, Galanis D, Papageorgiou H, Androutsopoulos I, Manandhar S, AL-Smadi M, Al-Ayyoub M, Zhao Y, Qin B, De Clercq O, Hoste V, Apidianaki M, Tannier X, Loukachevitch N, Kotelnikov E, Bel N, Jiménez-Zafra SM, Eryigit G. SemEval-2016 task 5: aspect based sentiment analysis. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016); San Diego, California: Association for Computational Linguistics; 2016. pp. 19–30. [Google Scholar]
Prata et al. (2016).Prata DN, Soares KP, Silva MA, Trevisan DQ, Letouze P. Social data analysis of Brazilian’s mood from Twitter. International Journal of Social Science and Humanity. 2016;6(3):179–183. doi: 10.7763/IJSSH.2016.V6.640. [DOI] [Google Scholar]
Pronoza et al. (2021).Pronoza E, Panicheva P, Koltsova O, Rosso P. Detecting ethnicity-targeted hate speech in Russian social media texts. Information Processing & Management. 2021;58(6):102674. doi: 10.1016/j.ipm.2021.102674. [DOI] [Google Scholar]
Read (2005).Read J. Using emoticons to reduce dependency in machine learning techniques for sentiment classification. Proceedings of the ACL Student Research Workshop; Ann Arbor, Michigan: Association for Computational Linguistics; 2005. pp. 43–48. [Google Scholar]
Rodina & Kutuzov (2020).Rodina J, Kutuzov A. RuSemShift: a dataset of historical lexical semantic change in Russian. Proceedings of the 28th International Conference on Computational Linguistics; Barcelona, Spain: International Committee on Computational Linguistics; 2020. pp. 1037–1047. [Google Scholar]
Rogers et al. (2018).Rogers A, Romanov A, Rumshisky A, Volkova S, Gronas M, Gribov A. RuSentiment: an enriched sentiment analysis dataset for social media in Russian. Proceedings of the 27th International Conference on Computational Linguistics; Santa Fe, New Mexico, USA: Association for Computational Linguistics; 2018. pp. 755–763. [Google Scholar]
Rosstat (2022).Rosstat How many people live in Russia: rosstat announced the first results of the census. 2022. https://www.strana2020.ru/novosti/skolko-lyudey-zhivet-v-rossii-rosstat-ozvuchil-pervye-itogi-perepisi/ https://www.strana2020.ru/novosti/skolko-lyudey-zhivet-v-rossii-rosstat-ozvuchil-pervye-itogi-perepisi/
Rubtsova (2013).Rubtsova Y. A method for development and analysis of short text corpus for the review classification task. Trudy XV Vserossiiskoy Naychnoy Konferencii RCDL’2013; 2013. pp. 269–275. [Google Scholar]
Salminen et al. (2018).Salminen JO, Al-Merekhi HA, Dey P, Jansen BJ. Inter-rater agreement for social computing studies. 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS); IEEE; 2018. pp. 80–87. [Google Scholar]
Smetanin (2017).Smetanin S. The program for public mood monitoring through Twitter content in Russia. Proceedings of the Institute for System Programming of the RAS. 2017;29(4):315–324. doi: 10.15514/ISPRAS-2017-29(4)-22. [DOI] [Google Scholar]
Smetanin (2020a).Smetanin S. The applications of sentiment analysis for Russian language texts: current challenges and future perspectives. IEEE Access. 2020a;8:110693–110719. doi: 10.1109/ACCESS.2020.3002215. [DOI] [Google Scholar]
Smetanin (2020b).Smetanin S. Toxic comments detection in Russian. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2020; Russian State University for the Humanities; 2020b. [Google Scholar]
Smetanin & Komarov (2019).Smetanin S, Komarov M. Sentiment analysis of product reviews in Russian using convolutional neural networks. 2019 IEEE 21st Conference on Business Informatics (CBI); IEEE; 2019. pp. 482–486. [Google Scholar]
Smetanin & Komarov (2021a).Smetanin S, Komarov M. Deep transfer learning baselines for sentiment analysis in Russian. Information Processing & Management. 2021a;58(3):102484. doi: 10.1016/j.ipm.2020.102484. [DOI] [Google Scholar]
Smetanin & Komarov (2021b).Smetanin S, Komarov M. Share of toxic comments among different topics: the case of Russian social networks. 2021 IEEE 23rd Conference on Business Informatics (CBI); Piscataway: IEEE; 2021b. pp. 65–70. [Google Scholar]
Szczepański et al. (2021).Szczepański M, Pawlicki M, Kozik R, Choraś M. New explainability method for BERT-based model in fake news detection. Scientific Reports. 2021;11(1):1–13. doi: 10.1038/s41598-021-03100-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Szmigiera (2022).Szmigiera M. The most spoken languages worldwide in 2022. 2022. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/ https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Ten Thij, Bhulai & Kampstra (2014).Ten Thij M, Bhulai S, Kampstra P. Circadian patterns in Twitter. IARIA Data Analytics 2014; 2014. pp. 12–17. [Google Scholar]
Tripto & Ali (2018).Tripto NI, Ali ME. Detecting multilabel sentiment and emotions from Bangla YouTube comments. 2018 International Conference on Bangla Speech and Language Processing (ICBSLP); Piscataway: IEEE; 2018. pp. 1–6. [Google Scholar]
Wang, Callan & Zheng (2015).Wang Y, Callan J, Zheng B. Should we use the sample? Analyzing datasets sampled from Twitter’s stream API. ACM Transactions on the Web. 2015;9(3):1–23. doi: 10.1145/2746366. [DOI] [Google Scholar]
Wolf et al. (2020).Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Scao TL, Gugger S, Drame M, Lhoest Q, Rush AM. Transformers: state-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; Association for Computational Linguistics; 2020. pp. 38–45. [Google Scholar]
Zueva, Kabirova & Kalaidin (2020).Zueva N, Kabirova M, Kalaidin P. Reducing unintended identity bias in Russian hate speech detection. Proceedings of the Fourth Workshop on Online Abuse and Harms; 2020. pp. 65–69. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information 1. Ref Heart Emoji.

Click here for additional data file.^{(19.3KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-1

Supplemental Information 2. Text Length Distribution.

Click here for additional data file.^{(39.9KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-2

Supplemental Information 3. Smiling Face with Heart Eyes Emoji.

Click here for additional data file.^{(27.8KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-3

Supplemental Information 4. Pleading Face Emoji.

Click here for additional data file.^{(27.3KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-4

Supplemental Information 5. Growing Heart Emoji.

Click here for additional data file.^{(26.7KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-5

Supplemental Information 6. Purple Heart Emoji.

Click here for additional data file.^{(15.4KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-6

Supplemental Information 7. Rolling on the Floor Laughing Emoji.

Click here for additional data file.^{(29.2KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-7

Supplemental Information 8. Loudly Crying Face Emoji.

Click here for additional data file.^{(27KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-8

Supplemental Information 9. Face With Tears of Joy Emoji.

Click here for additional data file.^{(29.2KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-9

Supplemental Information 10. Two Hearts Emoji.

Click here for additional data file.^{(15.3KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-10

Supplemental Information 11. Pensive Face Emoji.

Click here for additional data file.^{(25KB, png)}

DOI: 10.7717/peerj-cs.1039/supp-11

Data Availability Statement

The following information was supplied regarding data availability:

The code and dataset are available at GitHub: https://github.com/sismetanin/rusentitweet.

The sentiment classification model is available at HuggingFace: https://huggingface.co/sismetanin/rubert-rusentitweet.

[ref-1] Ahmadi et al. (2017).Ahmadi Z, Skowron M, Stier A, Kramer S. An in-depth experimental comparison of RNTNs and CNNs for sentence modeling. International Conference on Discovery Science; Springer; 2017. pp. 144–152. [Google Scholar]

[ref-2] Aly & Atiya (2013).Aly M, Atiya A. LABR: a large scale Arabic book reviews dataset. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Short Papers); Association for Computational Linguistics; 2013. pp. 494–498. [Google Scholar]

[ref-3] Antonakaki, Fragopoulou & Ioannidis (2021).Antonakaki D, Fragopoulou P, Ioannidis S. A survey of Twitter research: data model, graph structure, sentiment analysis and attacks. Expert Systems with Applications. 2021;164:114006. doi: 10.1016/j.eswa.2020.114006. [DOI] [Google Scholar]

[ref-4] Araslanov, Komotskiy & Agbozo (2020).Araslanov E, Komotskiy E, Agbozo E. Assessing the impact of text preprocessing in sentiment analysis of short social network messages in the Russian language. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI); Piscataway: IEEE; 2020. pp. 1–4. [Google Scholar]

[ref-5] Arefiev (2013).Arefiev A. Demographic changes are not good for the Russian language. Demoscope Weekly. 2013:571–572. [Google Scholar]

[ref-6] Babakov et al. (2021).Babakov N, Logacheva V, Kozlova O, Semenov N, Panchenko A. Detecting inappropriate messages on sensitive topics that could harm a company’s reputation. Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing; Kiyv, Ukraine: Association for Computational Linguistics; 2021. pp. 26–36. [Google Scholar]

[ref-7] Babakov, Logacheva & Panchenko (2022).Babakov N, Logacheva V, Panchenko A. Beyond plain toxic: detection of inappropriate statements on flammable topics for the Russian language. 2022. https://arxiv.org/abs/2203.02392 https://arxiv.org/abs/2203.02392

[ref-8] Babii, Kazyulina & Malafeev (2020).Babii A, Kazyulina M, Malafeev AY. Automatic emotion identification in Russian text messages. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2020; Russian State University for the Humanities; 2020. pp. 1002–1010. [Google Scholar]

[ref-9] Babii, Kazyulina & Malafeev (2021).Babii A, Kazyulina M, Malafeev A. FastText-based methods for emotion identification in Russian internet discourse. WebSci ’21: 13th ACM Web Science Conference 2021; 2021. pp. 112–119. [Google Scholar]

[ref-10] Barnes, Øvrelid & Velldal (2019).Barnes J, Øvrelid L, Velldal E. Sentiment analysis is not solved! assessing and probing sentiment classification. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP; Florence, Italy: Association for Computational Linguistics; 2019. pp. 12–23. [Google Scholar]

[ref-11] Baylis (2020).Baylis P. Temperature and temperament: evidence from Twitter. Journal of Public Economics. 2020;184:104161. doi: 10.1016/j.jpubeco.2020.104161. [DOI] [Google Scholar]

[ref-12] Baylis et al. (2018).Baylis P, Obradovich N, Kryvasheyeu Y, Chen H, Coviello L, Moro E, Cebrian M, Fowler JH. Weather impacts expressed sentiment. PLOS ONE. 2018;13(4):e0195750. doi: 10.1371/journal.pone.0195750. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-13] Baymurzina, Kuznetsov & Burtsev (2019).Baymurzina D, Kuznetsov D, Burtsev M. Language model embeddings improve sentiment analysis in Russian. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019”; 2019. pp. 53–63. [Google Scholar]

[ref-14] Beckler et al. (2018).Beckler DT, Thumser ZC, Schofield JS, Marasco PD. Reliability in evaluator-based tests: using simulation-constructed models to determine contextually relevant agreement thresholds. BMC Medical Research Methodology. 2018;18(1):1–12. doi: 10.1186/s12874-018-0606-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-15] Bermingham & Smeaton (2009).Bermingham A, Smeaton AF. A study of inter-annotator agreement for opinion retrieval. SIGIR ’09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval; 2009. pp. 784–785. [Google Scholar]

[ref-16] Besacier et al. (2014).Besacier L, Barnard E, Karpov A, Schultz T. Automatic speech recognition for under-resourced languages: a survey. Speech Communication. 2014;56:85–100. doi: 10.1016/j.specom.2013.07.008. [DOI] [Google Scholar]

[ref-17] Bird, Klein & Loper (2009).Bird S, Klein E, Loper E. Natural language processing with Python: analyzing text with the natural language toolkit. Newton: O’Reilly Media, Inc; 2009. [Google Scholar]

[ref-18] Blinov, Kotelnikov & Pestov (2013).Blinov P, Kotelnikov E, Pestov O. Research of lexical approach and machine learning methods for sentiment analysis. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2013; Russian State University for the Humanities; 2013. pp. 51–61. [Google Scholar]

[ref-19] Borodkina & Sibirev (2019).Borodkina O, Sibirev V. Migration issues in Russian Twitter: attitudes to migrants, social problems and online resources. In: El Yacoubi S, Bagnoli F, Pacini G, editors. Internet Science. Cham: Springer International Publishing; 2019. pp. 32–46. [Google Scholar]

[ref-20] Buntoro, Adji & Purnamasari (2016).Buntoro GA, Adji TB, Purnamasari AE. Sentiment analysis candidates of Indonesian Presiden 2014 with five class attribute. International Journal of Computer Applications. 2016;136(2):23–29. doi: 10.5120/ijca2016908288. [DOI] [Google Scholar]

[ref-21] Chetviorkin, Braslavskiy & Loukachevich (2013).Chetviorkin I, Braslavskiy P, Loukachevich N. Sentiment analysis track at ROMIP 2011. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2012; Russian State University for the Humanities; 2013. pp. 1–14. [Google Scholar]

[ref-22] Chetvirokin & Loukachevitch (2013).Chetvirokin I, Loukachevitch N. Sentiment analysis track at ROMIP 2012. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2013; 2013. p. 2. [Google Scholar]

[ref-23] Chizhik (2016).Chizhik A. Factors for forming social mood on the basis of the analysis of the emotional coloring of posts in the Russian-language Twitter. New Information Technologies in Automated Systems; HSE Tikhonov Moscow Institute of Electronics and Mathematics; 2016. pp. 61–64. [Google Scholar]

[ref-24] Devlin et al. (2019).Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Long and Short Papers; Minneapolis, Minnesota: Association for Computational Linguistics; 2019. pp. 4171–4186. [Google Scholar]

[ref-25] Dzogang et al. (2017a).Dzogang F, Goulding J, Lightman S, Cristianini N. Seasonal variation in collective mood via Twitter content and medical purchases. International Symposium on Intelligent Data Analysis; Springer; 2017a. pp. 63–74. [Google Scholar]

[ref-26] Dzogang, Lightman & Cristianini (2017b).Dzogang F, Lightman S, Cristianini N. Circadian mood variations in Twitter content. Brain and Neuroscience Advances. 2017b;1:2398212817744501. doi: 10.1177/2398212817744501. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-27] Fiok et al. (2021).Fiok K, Karwowski W, Gutierrez E, Wilamowski M. Analysis of sentiment in tweets addressed to a single domain-specific Twitter account: comparison of model performance and explainability of predictions. Expert Systems with Applications. 2021;186:115771. doi: 10.1016/j.eswa.2021.115771. [DOI] [Google Scholar]

[ref-28] Golubev & Loukachevitch (2020).Golubev A, Loukachevitch N. Improving results on Russian sentiment datasets. In: Filchenkov A, Kauttonen J, Pivovarova L, editors. Artificial Intelligence and Natural Language. Cham: Springer International Publishing; 2020. pp. 109–121. [Google Scholar]

[ref-29] Hillaire (2021).Hillaire GE. Understanding emotions in online learning: using emotional design and emotional measurement to unpack complex emotions during collaborative learning. Milton Keynes, UK: Open University; 2021. [Google Scholar]

[ref-30] Hillaire et al. (2021).Hillaire G, Rienties B, Fenton-O’Creevy M, Zdrahal Z, Tempelaar D. Open World Learning: Research, Innovation and the Challenges of High-Quality Education. London: Routledge; 2021. Incorporating student opinion into opinion mining; pp. 171–185. [DOI] [Google Scholar]

[ref-31] Jamadi Khiabani, Basiri & Rastegari (2020).Jamadi Khiabani P, Basiri ME, Rastegari H. An improved evidence-based aggregation method for sentiment analysis. Journal of Information Science. 2020;46(3):340–360. doi: 10.1177/0165551519837187. [DOI] [Google Scholar]

[ref-32] Kanev et al. (2022).Kanev AI, Savchenko GA, Grishin IA, Vasiliev DA, Duma EM. Sentiment analysis of multilingual texts using machine learning methods. 2022 Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus); Piscataway: IEEE; 2022. pp. 326–331. [Google Scholar]

[ref-33] Kausar, Soosaimanickam & Nasar (2021).Kausar MA, Soosaimanickam A, Nasar M. Public sentiment analysis on Twitter data during COVID-19 outbreak. International Journal of Advanced Computer Science and Applications. 2021;12(2):415–422. doi: 10.14569/issn.2156-5570. [DOI] [Google Scholar]

[ref-34] Kazyulina, Babii & Malafeev (2020).Kazyulina M, Babii A, Malafeev A. Emotion classification in Russian: feature engineering and analysis. International Conference on Analysis of Images, Social Networks and Texts; Springer; 2020. pp. 135–148. [Google Scholar]

[ref-35] Kirilenko & Stepchenkova (2017).Kirilenko AP, Stepchenkova SO. Sochi 2014 Olympics on Twitter: perspectives of hosts and guests. Tourism Management. 2017;63:54–65. doi: 10.1016/j.tourman.2017.06.007. [DOI] [Google Scholar]

[ref-36] Koltsova, Alexeeva & Kolcov (2016).Koltsova OY, Alexeeva S, Kolcov S. An opinion word lexicon and a training dataset for Russian sentiment analysis of social media. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2016; Russian State University for the Humanities; 2016. pp. 277–287. [Google Scholar]

[ref-37] Konstantinov, Moshkin & Yarushkina (2021).Konstantinov A, Moshkin V, Yarushkina N. Approach to the use of language models BERT and Word2Vec in sentiment analysis of social network texts. Recent Research in Control Engineering and Decision Making; Cham: Springer International Publishing; 2021. pp. 462–473. [Google Scholar]

[ref-38] Korablinov & Braslavski (2020).Korablinov V, Braslavski P. RUBQ: a Russian dataset for question answering over wikidata. International Semantic Web Conference; Cham: Springer; 2020. pp. 97–110. [Google Scholar]

[ref-39] Kostenetskiy, Chulkevich & Kozyrev (2021).Kostenetskiy P, Chulkevich R, Kozyrev V. Hpc resources of the higher school of economics. Journal of Physics: Conference Series. 2021;1740:12050. doi: 10.1088/1742-6596/1740/1/012050. IOP Publishing. [DOI] [Google Scholar]

[ref-40] Kotelnikov (2021).Kotelnikov E. Current landscape of the Russian sentiment corpora. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2021; Russian State University for the Humanities; 2021. pp. 433–444. [Google Scholar]

[ref-41] Kotelnikova (2020).Kotelnikova A. Comparison of deep learning and rule-based method for the sentiment analysis task. 2020 International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon); Piscataway: IEEE; 2020. pp. 1–6. [Google Scholar]

[ref-42] Krippendorff (1980).Krippendorff K. Content analysis: an introduction to its methodology. Thousand Oaks: SAGE; 1980. [Google Scholar]

[ref-43] Krippendorff (2004).Krippendorff K. Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research. 2004;30(3):411–433. doi: 10.1111/j.1468-2958.2004.tb00738.x. [DOI] [Google Scholar]

[ref-44] Kumar & Raman (2022).Kumar P, Raman B. A BERT based dual-channel explainable text emotion recognition system. Neural Networks. 2022;150:392–407. doi: 10.1016/j.neunet.2022.03.017. [DOI] [PubMed] [Google Scholar]

[ref-45] Kuratov & Arkhipov (2019).Kuratov Y, Arkhipov M. Adaptation of deep bidirectional multilingual transformers for Russian language. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2019; Russian State University for the Humanities; 2019. pp. 333–340. [Google Scholar]

[ref-46] Landis & Koch (1977).Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174. doi: 10.2307/2529310. [DOI] [PubMed] [Google Scholar]

[ref-47] Larsen et al. (2015).Larsen ME, Boonstra TW, Batterham PJ, O’Dea B, Paris C, Christensen H. We Feel: mapping emotion on Twitter. IEEE Journal of Biomedical and Health Informatics. 2015;19(4):1246–1252. doi: 10.1109/JBHI.2015.2403839. [DOI] [PubMed] [Google Scholar]

[ref-48] Leetaru (2019).Leetaru K. Is Twitter’s spritzer stream really a nearly perfect 1% sample of its firehose? Forbes. 2019 [Google Scholar]

[ref-49] Li et al. (2018).Li M, Ch’ng E, Chong AYL, See S. Multi-class Twitter sentiment classification with emojis. Industrial Management & Data Systems. 2018;118(9):1804–1820. doi: 10.1108/IMDS-12-2017-0582. [DOI] [Google Scholar]

[ref-50] Lopatin & Ulukhanov (2017).Lopatin V, Ulukhanov I. Russian language. Languages of the World. 2017:276–540. [Google Scholar]

[ref-51] Loukachevitch et al. (2015).Loukachevitch N, Blinov P, Kotelnikov E, Rubtsova Y, Ivanov V, Tutubalina E. SentiRuEval: testing object-oriented sentiment analysis systems in Russian. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2015; Russian State University for the Humanities; 2015. pp. 3–13. [Google Scholar]

[ref-52] Lukashevich & Rubtsova (2016).Lukashevich N, Rubtsova YV. Sentirueval-2016: overcoming time gap and data sparsity in tweet sentiment analysis. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2016; Russian State University for the Humanities; 2016. pp. 416–426. [Google Scholar]

[ref-53] Mozeticar & Smailović (2016).Mozeticar M, Smailović J. Multilingual Twitter sentiment classification: the role of human annotators. PLOS ONE. 2016;11(5):e0155036. doi: 10.1371/journal.pone.0155036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-54] Muhammad et al. (2022).Muhammad SH, Adelani DI, Ahmad IS, Abdulmumin I, Bello BS, Choudhury M, Emezue CC, Aremu A, Abdul S, Brazdil P. NaijaSenti: a Nigerian Twitter sentiment corpus for multilingual sentiment analysis. 2022. https://arxiv.org/abs/2201.08277 https://arxiv.org/abs/2201.08277

[ref-55] Pak & Paroubek (2012).Pak A, Paroubek P. Language independent approach to sentiment analysis (LIMSI participation in ROMIP’11). Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2012; 2012. pp. 37–50. [Google Scholar]

[ref-56] Pavliy and Lewis (2016).Pavliy B, Lewis J. The performance of Twitter’s language detection algorithm and Google’s Compact Language Detector on language detection in Ukrainian and Russian tweets. Bulletin of Toyama University of International Studies. 2016;8:99–106. [Google Scholar]

[ref-57] Pontiki et al. (2016).Pontiki M, Galanis D, Papageorgiou H, Androutsopoulos I, Manandhar S, AL-Smadi M, Al-Ayyoub M, Zhao Y, Qin B, De Clercq O, Hoste V, Apidianaki M, Tannier X, Loukachevitch N, Kotelnikov E, Bel N, Jiménez-Zafra SM, Eryigit G. SemEval-2016 task 5: aspect based sentiment analysis. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016); San Diego, California: Association for Computational Linguistics; 2016. pp. 19–30. [Google Scholar]

[ref-58] Prata et al. (2016).Prata DN, Soares KP, Silva MA, Trevisan DQ, Letouze P. Social data analysis of Brazilian’s mood from Twitter. International Journal of Social Science and Humanity. 2016;6(3):179–183. doi: 10.7763/IJSSH.2016.V6.640. [DOI] [Google Scholar]

[ref-59] Pronoza et al. (2021).Pronoza E, Panicheva P, Koltsova O, Rosso P. Detecting ethnicity-targeted hate speech in Russian social media texts. Information Processing & Management. 2021;58(6):102674. doi: 10.1016/j.ipm.2021.102674. [DOI] [Google Scholar]

[ref-60] Read (2005).Read J. Using emoticons to reduce dependency in machine learning techniques for sentiment classification. Proceedings of the ACL Student Research Workshop; Ann Arbor, Michigan: Association for Computational Linguistics; 2005. pp. 43–48. [Google Scholar]

[ref-61] Rodina & Kutuzov (2020).Rodina J, Kutuzov A. RuSemShift: a dataset of historical lexical semantic change in Russian. Proceedings of the 28th International Conference on Computational Linguistics; Barcelona, Spain: International Committee on Computational Linguistics; 2020. pp. 1037–1047. [Google Scholar]

[ref-62] Rogers et al. (2018).Rogers A, Romanov A, Rumshisky A, Volkova S, Gronas M, Gribov A. RuSentiment: an enriched sentiment analysis dataset for social media in Russian. Proceedings of the 27th International Conference on Computational Linguistics; Santa Fe, New Mexico, USA: Association for Computational Linguistics; 2018. pp. 755–763. [Google Scholar]

[ref-63] Rosstat (2022).Rosstat How many people live in Russia: rosstat announced the first results of the census. 2022. https://www.strana2020.ru/novosti/skolko-lyudey-zhivet-v-rossii-rosstat-ozvuchil-pervye-itogi-perepisi/ https://www.strana2020.ru/novosti/skolko-lyudey-zhivet-v-rossii-rosstat-ozvuchil-pervye-itogi-perepisi/

[ref-64] Rubtsova (2013).Rubtsova Y. A method for development and analysis of short text corpus for the review classification task. Trudy XV Vserossiiskoy Naychnoy Konferencii RCDL’2013; 2013. pp. 269–275. [Google Scholar]

[ref-65] Salminen et al. (2018).Salminen JO, Al-Merekhi HA, Dey P, Jansen BJ. Inter-rater agreement for social computing studies. 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS); IEEE; 2018. pp. 80–87. [Google Scholar]

[ref-66] Smetanin (2017).Smetanin S. The program for public mood monitoring through Twitter content in Russia. Proceedings of the Institute for System Programming of the RAS. 2017;29(4):315–324. doi: 10.15514/ISPRAS-2017-29(4)-22. [DOI] [Google Scholar]

[ref-67] Smetanin (2020a).Smetanin S. The applications of sentiment analysis for Russian language texts: current challenges and future perspectives. IEEE Access. 2020a;8:110693–110719. doi: 10.1109/ACCESS.2020.3002215. [DOI] [Google Scholar]

[ref-68] Smetanin (2020b).Smetanin S. Toxic comments detection in Russian. Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue” 2020; Russian State University for the Humanities; 2020b. [Google Scholar]

[ref-69] Smetanin & Komarov (2019).Smetanin S, Komarov M. Sentiment analysis of product reviews in Russian using convolutional neural networks. 2019 IEEE 21st Conference on Business Informatics (CBI); IEEE; 2019. pp. 482–486. [Google Scholar]

[ref-70] Smetanin & Komarov (2021a).Smetanin S, Komarov M. Deep transfer learning baselines for sentiment analysis in Russian. Information Processing & Management. 2021a;58(3):102484. doi: 10.1016/j.ipm.2020.102484. [DOI] [Google Scholar]

[ref-71] Smetanin & Komarov (2021b).Smetanin S, Komarov M. Share of toxic comments among different topics: the case of Russian social networks. 2021 IEEE 23rd Conference on Business Informatics (CBI); Piscataway: IEEE; 2021b. pp. 65–70. [Google Scholar]

[ref-72] Szczepański et al. (2021).Szczepański M, Pawlicki M, Kozik R, Choraś M. New explainability method for BERT-based model in fake news detection. Scientific Reports. 2021;11(1):1–13. doi: 10.1038/s41598-021-03100-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-73] Szmigiera (2022).Szmigiera M. The most spoken languages worldwide in 2022. 2022. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/ https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/

[ref-74] Ten Thij, Bhulai & Kampstra (2014).Ten Thij M, Bhulai S, Kampstra P. Circadian patterns in Twitter. IARIA Data Analytics 2014; 2014. pp. 12–17. [Google Scholar]

[ref-75] Tripto & Ali (2018).Tripto NI, Ali ME. Detecting multilabel sentiment and emotions from Bangla YouTube comments. 2018 International Conference on Bangla Speech and Language Processing (ICBSLP); Piscataway: IEEE; 2018. pp. 1–6. [Google Scholar]

[ref-76] Wang, Callan & Zheng (2015).Wang Y, Callan J, Zheng B. Should we use the sample? Analyzing datasets sampled from Twitter’s stream API. ACM Transactions on the Web. 2015;9(3):1–23. doi: 10.1145/2746366. [DOI] [Google Scholar]

[ref-77] Wolf et al. (2020).Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Scao TL, Gugger S, Drame M, Lhoest Q, Rush AM. Transformers: state-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; Association for Computational Linguistics; 2020. pp. 38–45. [Google Scholar]

[ref-78] Zueva, Kabirova & Kalaidin (2020).Zueva N, Kabirova M, Kalaidin P. Reducing unintended identity bias in Russian hate speech detection. Proceedings of the Fourth Workshop on Online Abuse and Harms; 2020. pp. 65–69. [Google Scholar]

PERMALINK

RuSentiTweet: a sentiment analysis dataset of general domain tweets in Russian

Sergey Smetanin

Abstract

Introduction

Related work

Table 1. Sentiment analysis datasets of Russian language texts.

Sentiment dataset

Data collection

Data annotation

Guidelines

Crowdsourcing platform

Figure 1. An example of user interface for annotators in Yandex.Toloka in Russian (on the left) and its translation in English (on the right).

Aggregation

Table A1. Examples of tweets with no agreement between annotators.

Inter-annotator agreement

Table 2. Distance between classes for interval Krippendorff’s α, where 0 means that classes are the same, 1 means that classes are close to each other, and 2 means that classes a far away from each other.

Explanatory analysis

Figure 2. Texts length distribution.

Table 3. Most common unigrams, bigrams, and emojis without stop words, punctuation, and numbers.

Sentiment classification baseline

Model selection

Results

Table 4. Five-class sentiment classification on RuSentiTweet.

Table 5. Five-class sentiment classification studies.

Error analysis

Figure 3. Confusion matrix for RuSentiTweet.

Figure 4. Confusion matrix for RuSentiment were created using molders from Smetanin & Komarov (2021a).

Table A2. Examples of tweets classification.

Conclusion

Supplemental Information

Acknowledgments

Funding Statement

Footnotes

Appendix

Additional Information and Declarations

Competing Interests

Author Contributions

Data Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 2. Distance between classes for interval Krippendorff’s $α$ , where 0 means that classes are the same, 1 means that classes are close to each other, and 2 means that classes a far away from each other.