Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2023 Jun 5:1–39. Online ahead of print. doi: 10.1007/s11227-023-05423-9

Topic sentiment analysis based on deep neural network using document embedding technique

Azam Seilsepour 1, Reza Ravanmehr 1,, Ramin Nassiri 1
PMCID: PMC10241384  PMID: 37359345

Abstract

Sentiment Analysis (SA) is a domain- or topic-dependent task since polarity terms convey different sentiments in various domains. Hence, machine learning models trained on a specific domain cannot be employed in other domains, and existing domain-independent lexicons cannot correctly recognize the polarity of domain-specific polarity terms. Conventional approaches of Topic Sentiment Analysis perform Topic Modeling (TM) and SA sequentially, utilizing the previously trained models on irrelevant datasets for classifying sentiments that cannot provide acceptable accuracy. However, some researchers perform TM and SA simultaneously using topic-sentiment joint models, which require a list of seeds and their sentiments from widely used domain-independent lexicons. As a result, these methods cannot find the polarity of domain-specific terms correctly. This paper proposes a novel supervised hybrid TSA approach, called Embedding Topic Sentiment Analysis using Deep Neural Networks (ETSANet), that extracts the semantic relationships between the hidden topics and the training dataset using Semantically Topic-Related Documents Finder (STRDF). STRDF discovers those training documents in the same context as the topic based on the semantic relationships between the Semantic Topic Vector, a newly introduced concept that encompasses the semantic aspects of a topic, and the training dataset. Then, a hybrid CNN–GRU model is trained by these semantically topic-related documents. Moreover, a hybrid metaheuristic method utilizing Grey Wolf Optimization and Whale Optimization Algorithm is employed to fine-tune the hyperparameters of the CNN–GRU network. The evaluation results demonstrate that ETSANet increases the accuracy of the state-of-the-art methods by 1.92%.

Keywords: Topic sentiment analysis, Topic modeling, Semantic topic vector, Semantic similarity, GRU, CNN

Introduction

The rapid expansion of social media has changed how people express their feelings and opinions about events, products, etc. Microblogs, instant messaging, and online product reviews enable users to express their feelings freely on any topic. For example, manufacturers can take advantage of customer reviews of new products to promote their goods. New customers can use the buyers’ comments and obtain an advanced overview to purchase new products. Politicians can review users' critiques of important news and improve their pre-election behavior. In all of these scenarios, the Sentiment Analysis (SA) task needs to be performed for each topic. SA is one of the most important natural language processing (NLP) tasks considered by researchers for many years. Different researches in SA are classified in the scope of Machine Learning (ML) and lexicon-based approaches [1, 2].

In the SA scope of work, the sentimental polarity of words is inherently dependent on the domain [3]. For instance, the adjective “unpredictable” has negative polarity in the phrase “unpredictable steering” in a car review text but a positive polarity in the phrase “unpredictable plot” in a movie review sentence. In another instance, the adjective “low” has negative polarity in the phrase “low salary” but positive polarity in the phrase “low price”. It means that the domain or context in which the polarity word is used should be considered in SA. Therefore, ML models trained in a specific domain cannot be used in other domains because sentiment words have different polarities in various domains [4]. In addition, the domain-independent lexicons cannot correctly detect the polarity of domain-dependent terms. Topic Sentiment Analysis (TSA), which combines Topic Modeling (TM) and SA, aims to address this problem by capturing the relationships between the topics or domains and the sentiments. TSA approaches are commonly categorized into conventional and topic-sentiment joint models.

The conventional approaches of TSA perform TM and SA tasks in a two-stage pipeline, respectively [518]. In the real-world application of TSA, the topic detection dataset does not have labels, so these methods utilize a previously trained classifier or a domain-independent lexicon to classify the dataset. In the first case, using the previously trained classifier, the classifier was trained on a semantically irrelevant dataset that is not in the same context as the topics. Hence, the semantic relationships between the topics and the training dataset are ignored. In the second case, using domain-independent lexicons, the polarity of domain-dependent polarity terms is not recognized correctly. As a result, these methods do not provide the desired accuracy since they do not consider the domain or context. For instance, Guo et al. [19] proposed a TSA method, called BJ-LDA, that uses Latent Dirichlet Allocation (LDA) [20] to perform topic detection and Maximum Entropy to separate aspect words and their corresponding opinion words to describe each brand from a detailed perspective. Carvache et al. [8] and Abiola et al. [10] collected the tweets on COVID-19 and used LDA to extract the hidden topics. The former employs the SentiStrength, and the latter uses VADER and TextBlob to determine the sentiments corresponding to each topic.

The other TSA approach, topic-sentiment join approaches, perform TM and SA simultaneously [4, 2131]. They usually customize the LDA method by adding a sentiment layer. These approaches use a list of polarity terms and their polarity as the seeds. Since these seeds are borrowed from widely used domain-independent lexicons, the polarity of domain-dependence polarity terms is not recognized correctly. Additionally, they do not obtain satisfactory results when the training corpus is small or the samples are short [4]. For instance, JST (Joint Sentiment/Topic) [21] was developed by adding a layer called sentiment to the LDA algorithm. The multimodal JST (MJST) [26] method also adds the sentiment layer to the LDA algorithm and uses other features, such as Emoticon and Personality Factors, to analyze the author’s sentiments. Unsupervised Topic-Sentiment Joint (UTSJ) [29] is also based on the LDA algorithm and takes advantage of Support Vector Machine (SVM) and Random Forest (RF) classifiers to distinguish between real and fake reviews. Osmani et al. [22] developed the variation of the LDA method to extend LDA through some extra features such as date, author, helpfulness, and subtopic to the sentiment. Nimala and Jebakumar [30] also proposed an approach to assess students’ opinions about different topics using LDA.

Recently, researchers have benefited from Recurrent Neural Networks (RNN) such as Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) to develop TSA approaches since these networks are capable of capturing the long-term dependencies and solving the vanishing gradient problem of traditional RNNs. For instance, Topic Sentiment Joint Model with Word Embeddings Dependency (TSJM-WED) [25] uses GRU and LSTM as well as word embeddings for TSA. Topic-Dependent Attention Model (TDAM) [31] utilizes an Attention-based mechanism and a Bidirectional GRU neural network for TSA. Jelodar et al. [5] have utilized LSTM for TSA in COVID-19 online discussions. Moreover, it has been demonstrated that Convolutional Neural Networks (CNN), a kind of feed-forward neural network, improve text classification accuracy because they have a strong capacity for extracting local and deep features from text using convolution layers.

To overcome the mentioned problem of TSA approaches, ignoring the semantic relationships between the topics and sentiments, by not considering the domains in which the polarity words are used that leads to low accuracy, this paper proposes a novel hybrid TSA approach, called ETSANet. ETSANet finds the documents which are in the same domain as topics, and trains an ML model using these documents. It proposes a new method, called Semantically Topic-Related Documents Finder (STRDF). STRDF introduces a new concept, called Semantic Topic Vector that encompasses the semantic aspects (domain) of a topic. Indeed, Semantic Topic Vector is the average of all document vectors of words composing a topic. STRDF creates the Semantic Topic Vector of each topic and discovers those training documents which are in the same domain as the topic, called semantically topic-related documents. To find semantically topic-related documents, ETSANet utilizes the semantic similarity measure. Later, these semantically topic-related documents are fed into a hybrid Deep Neural Network (DNN) model composed of a CNN and GRU. The CNN–GRU model is trained with labeled semantically topic-related documents corresponding to each topic. As a result, by finding and training an ML model with semantically topic-related documents, ETSANet can solve the problem of TSA approaches. Moreover, ETSANet utilizes GWO–WOA [32], which is a hybrid metaheuristic algorithm composed of Grey Wolf Optimization (GWO) [33] and Whale Optimization Algorithm (WOA) [34], to fine-tune the hyperparameters of CNN–GRU network. The source code for reproducing the experimentations can be found in our GitHub repository.1

The main contributions of the article can be summarized as follows:

  • Proposing a novel TSA method, called ETSANet, based on semantic similarity, document embedding, and DNNs;

  • Proposing a new method, called STRDF, that finds the semantic relationships between the topics and the training dataset using a newly introduced concept called Semantic Topic Vector;

  • Developing a hybrid CNN–GRU network to classify the semantically topic-related documents and identify the overall sentimental polarity corresponding to each topic;

  • Utilizing the GWO–WOA to fine-tune the hyperparameters of the CNN–GRU network;

In the rest of the paper, the basics of the research and related works are explained in Sect. 2. Section 3 describes ETSANet in more detail, and Sect. 4 presents the evaluation results. Finally, Sect. 5 ends up with the conclusion and future works.

Background of the research

In this Section, first, we explain the basic concepts of our research, which are the building blocks of the proposed method, to provide the fundamental knowledge for readers to understand ETSANet better. Then, a review of previous major TSA approaches is conducted. Finally, we explain the research gap.

Topic modeling

The TM is an essential subject in text analysis methods that automatically examines the content of a text body and extracts meaningful units called topics. The most widely used TM algorithm introduced by Blei et al. [20] is the LDA method, which is a generative probabilistic model for discrete data such as text collections. It is a three-level hierarchical Bayesian model in which each document of the text body is modeled as a finite combination of topics. It is assumed that each document can be expressed as the probability distribution of topics, and each topic can be stated as the probability distribution of words [20].

The LDA method receives the number of topics that should be discovered as input and finds the topics. In order to find the number of topics, the topic coherence measure proposed in [35], which captures the semantic interpretability of discovered topics. Topic coherence is based on a sliding window, a one-set segmentation of the top words, and an indirect confirmation metric that employs Normalized Pointwise Mutual Information (NPMI) and cosine similarity. It obtains co-occurrence counts for the top words using a sliding window of size 110, a default value proposed in [35]. Then, these counts are used to calculate the NPMI of every top word to other top words. As a result, a set of vectors, one for every top word, is created. Later on, the one-set segmentation of the top words results in the calculation of the cosine similarity between every top word vector and the sum of all top word vectors. In the end, the topic coherence is the arithmetic mean of these cosine similarities.

Word and document embedding

In NLP, the bag-of-words and bag-of-n-grams methods are used to construct feature vectors. Both algorithms ignore the meaning of words. In the bag-of-words method, the order is ignored. Although the bag-of-n-grams method considers the order of words in short texts, it has a problem with data sparsity and high dimensionality. Word embedding methods have been proposed to overcome these two weaknesses [36].

Word embedding is a way of presenting words that attempts to convey the morphological, syntactic, and semantic information of words in the form of feature vectors. This method presents words with similar meanings and contexts with similar vectors. Word embedding plays an essential role in various NLP tasks, especially SA [37]. Recent word embedding methods follow the theory of distribution. According to this theory, words used in a similar context have the same meaning and similar features. In recent years, various methods have been suggested for word embedding that turn words into feature vectors. One of these methods is Word2Vec, which works based on two models of skip-gram and Continuous Bag-of-Word (CBOW). In the skip-gram method, the middle word is presented, and the model predicts two words on both sides. However, in CBOW, the middle word is predicted by having two words on both sides [36].

Document embedding or paragraph vectors are approaches to converting documents of different lengths, such as sentences and paragraphs, into fixed-length feature vectors. These methods are indeed an extension of the word embedding methods. One of the most widely used document embedding methods is Doc2Vec [38], introduced by Word2Vec developers [36]. It is an unsupervised framework that learns the vector representation of continuously distributed text pieces such as sentences, paragraphs, or large texts.

Doc2Vec method is presented in two forms: Paragraph Vectors-Distributed Bag of Words (PV-DBOW) and Paragraph Vectors-Distributed Memory (PV-DM). In PV-DM, the paragraph vector and word vectors are concatenated or averaged to predict the next word in the text. In this method, each paragraph token can be considered as a word. In fact, it is a memory token that remembers something missed from the current text. For this reason, this is called the distributed memory model of paragraph vectors [38].

In PV-DBOW, context words are ignored in the input. The model randomly samples the words from the paragraph and predicts them in the output. This method is more straightforward than the first method, requiring less memory. In general, PV-DBOW is a simpler method, acts like the skip-gram technique, and ignores the order of words. On the other hand, PV-DM is more complex, has more parameters, and is similar to CBOW [38].

CNN/GRU

Deep learning is one of the most popular emerging fields in ML, originating from artificial neural networks. DNNs are usually developed by a set of sequential layers to learn data representations. Since DNNs can automatically identify and extract the features of the texts, they are increasingly employed in NLP tasks such as SA. Among the various types of DNNs, RNNs and CNNs have been widely used in SA [39].

CNN is a feed-forward neural network that usually includes three layers: convolution, pooling, and fully connected. Feature extraction and feature reduction are performed by convolution and pooling layers, respectively. The convolution layers used the various filters or kernels on embeddings to perform feature selection and create feature maps. These filters are also called feature detectors. With the help of the pooling layer, the dimension of feature maps is reduced, which decreases the computational workload and speeds up the operations for the following layers. Max-pooling is the most commonly used type of pooling that uses the maximum value of each window. The last layer is a classical, fully connected neural network that consists of a classifier function and connects the features of the text or image to the target classes [40].

The structure of CNNs alters insignificantly in various research fields. For instance, the following structure is widely used in NLP [41]. Assume d is the size of word embedding, and l is the length of sentence. Also, CRdl is the matrix representing the sentence, and ciRd is the word embedding of i-th word in the sentence. To create a new feature, a convolution layer uses a convolution kernel KRdw that is applied to a window of w words. Suppose ci feature is created by a window of words C,i:i+w using:

ci=σC,i:i+wK+b 1

where σ is a nonlinear activation function, usually ReLu or tanh, is the Hadamard product between matrices, and bR is a bias term. To create a feature map, the convolutional kernel is applied to the windows in each sentence:

c=fm1,fm2,,fml-w+1,fmRl-w+1 2

In the next step, the output of the convolutional layer is fed into the pooling layer, then the pooling operation is applied to the feature maps to select the most important features and reduce the dimensions. For instance, for the pool size equals 2, the max-pooling operation calculates the output as:

pi=maxfm2i-1,fm2i 3

Then, the output vector is:

P=p1,p2,,pl-w+12,pRl-w+12 4

RNNs can process current data using previous data and directed cycles. They utilize a memory mechanism to remember the state of previous data easily but face the problem of Vanishing Gradient and Exploding Gradient when learning long-term dependencies. LSTM and GRU networks as advanced models of RNNs have been proposed to solve this problem [42].

GRU is a type of RNN, similar to LSTM, that has been proposed to solve the Vanishing Gradient and Exploding Gradient problems. GRU networks provide less computational complexity and simpler architecture than LSTM. They have also shown better performance in many research areas, including NLP [43]. Unlike LSTM, there is no memory cell in GRU architecture. Each GRU cell includes two Update and Reset gates. The former, denoted by Zt specifies how much information should be stored for the future. The latter is denoted by rt and determines to what extent previous information should be forgotten. The ht-1 input is the information from the previous state (t-1). Value of h^t specifies what information should be removed from the previous steps. In the following equations, σ represents the sigmoid function, and means element-wise multiplication [44].

rt=σ(Wrxt+Urht-1) 5
zt=σ(Wzxt+Uzht-1) 6
ht^=tanh(Wxt+Urtht-1) 7
ht=1-ztht-1+zth^t 8

Hyperparameter optimization

Recently, metaheuristic algorithms have been widely used to find the best values of hyperparameters. For instance, the GWO has been utilized in [45, 46], and [47] to fine-tune the hyperparameters of CNN models, and WOA has also been employed in [48] for this purpose. GWO [33] is a metaheuristic algorithm encouraged by the hunting behavior of grey wolves. They hunt in the pack, and every pack follows a hierarchy of dominance. WOA [34] is another novel metaheuristic algorithm inspired by the bubble-net hunting strategy of humpback whales. They swim around the prey and create different bubbles in a 9-shaped path.

The motivation for using the GWO algorithm is its capability to avoid local optima [46]. Additionally, the capability of the WOA in balancing the exploration and exploitation compared to the state-of-the-art metaheuristic algorithms was proved [48]. However, GWO updates a solution based on the position of the best three agents, which makes it good at exploitation but poor at exploration. On the other hand, WOA updates a solution based on a randomly generated solution, making it more exploration-oriented and prone to slow convergence. Hence, combining these algorithms can balance exploration and exploitation [32]. Serial GWO–WOA [32] updates a solution in two phases. First, the operators of the GWO update the solution around the best three solutions (alpha, beta, and delta). WOA operators extend the search process to other promising areas in the second phase. Using this approach, GWO and WOA work on the same population.

Related works

In this Section, a review of previous major TSA approaches is conducted. TSA combines TM and SA methods to analyze users' sentiments for each topic. SA can be described as a process of analyzing text or speech to find the sentiments of the author or speaker [49]. We discuss the previous work of TSA in two parts: Conventional approaches and Topic-Sentiment joint models.

In conventional approaches of TSA, first TM is performed, followed by SA on each topic [518]. The serious problem with these methods is that these methods ignore the relationships between the sentiments and domains. Lexicon-based approaches have been used in several research in this scope. Ilyas et al. [7] examined users’ tweets about Brexit by the LDA and applied the Vader tool to analyze users’ sentiments on each topic. Yang et al. [14] developed dynamic Topic and SA Of User Reviews (TOUR) to analyze the reviews on different applications. They have used pre-trained glove vectors to detect opinion words of reviews and the IDEA to automatically detect and label each topic. Kwon et al. [16] extracted the hidden topics from online reviews of Skytrax (airlinequality.com) and applied the Opinion Lexicon to detect the sentiments of reviews. Pathak et al. [13] used online latent semantic indexing with regularization constraints to extract the topics and apply a topic-level attention mechanism to detect the sentiments of each sentence. Zhang et al. [17] used the LDA to extract the topics hidden in the datasets and utilized VADER and the number of positive and negative words to estimate the sentiments of each topic. Qiao and Williams [18] applied the LDA to find the hidden topics in the tweets on global warming problems. Later, they utilized NRC Word-Emotion Association Lexicon to estimate the total polarity of tweets. Carvache et al. [8] and Abiola et al. [10] collected the tweets on COVID-19 and used the LDA to extract the hidden topics. The former employed the SentiStrength, and the latter used the VADER and the TextBlob to determine the sentiments corresponding to each topic. Additionally, Yin et al. [9] collected tweets on the COVID-19 vaccine and performed the LDA to find the topics. Sequentially, they utilized the VADER to perform SA.

There are also several approaches that utilize machine learning models for conventional approaches of TSA. Ozyurt et al. [12] proposed an unsupervised aspect-based SA method named Sentence Segment LDA (SS-LDA), in which the topics are assigned to the sentence segments instead of words as the LDA performs. A segment is part of a sentence that covers a single aspect of a product. Jelodar et al. [5] employed LDA to extract topics discussed by the users of Reddit regarding Coronavirus and took advantage of LSTM to categorize users’ sentiments on each topic. Guo et al. [19] proposed a TSA method, called BJ-LDA, that uses the LDA to perform topic detection and Maximum Entropy to separate aspect words and their corresponding opinion words to describe each brand from a detailed perspective. Uthirapathy and Sandanam [11] utilized the LDA to detect the topics from Tweets on climate change. Then, they employ the BERT to classify the sentiments. Garcia et al. [15] collected the tweets on COVID-19 and performed the GSDMM algorithm to find hidden topics and the CrystalFeel method for SA.

Table 1 shows the list of conventional methods of TSA, including employed datasets and the pros/cons of each.

Table 1.

TSA: Conventional Approaches

References Pros and cons Dataset
Jelodar et al. [5]

 + Does not extend the LDA and classifies the sentiments

separately

 + Uses the LSTM to classify sentiments

−Does not consider the semantic relationship between

emotions and the domain (context)

Reddit
Ilyas et al. [7]

 + Analyzes the relationship between the topics and the

stock prices

−Does not consider the semantic relationship between

emotions and scope

Twitter
SS-LDA [12]

 + Assigns the topics to segment sentences, not words

−Does not consider the semantic relationship between

emotions and the domain

Turkish reviews on smartphones and SemEval-2016
Pathak et al. [13]

 + Uses the Word2vec to extract the feature vectors

−Does not compare with the TSA approaches

SemEval-2017 Task 4 Subtask B

Twitter, Facebook

TOUR [14]

 + Uses the pre-trained word vectors to detect opinion

words

−Does not evaluate the TOUR by different metrics

Reviews on applications
Kwon et al.[16]

 + Uses Opinion Lexicon to estimate the sentiments

−Does not consider the relationships between the topics

and the sentiments

Reviews from Skytrax (airlinequality.com)
Garcia et al. [15]

 + Uses GSDMM algorithm to find hidden topics and

CrystalFeel for estimating sentiments

−Does not consider the relationships between the topics

and the sentiments

Twitter
Yin et al. [9]

 + Uses the VADER to estimate the sentiments

−Does not consider the relationships between the topics and the sentiments

Twitter
Zhang et al. [17]

 + Uses the VADER to estimate the sentiments

−Does not consider the relationships between the topics

and the sentiments

TripAdvisor and Yelp Restaurant Reviews
Qiao and Williams [18]

 + Uses the NRC Word-Emotion Association Lexicon to

estimate the sentiments

−Does not consider the relationships between the topics

and the sentiments

Twitter
BJ-LDA [19]

 + Uses the Maximum Entropy to separate aspect words

and their corresponding opinion words

−Does not use word embedding techniques

Japanese restaurant dataset

Skincare product dataset

Carvache et al. [8]

 + Uses the SentiStrength to estimate the sentiments

−Does not consider the relationships between the topics

and the sentiments

Twitter
Abiola et al. [10]

 + Uses the VADER and TextBlob to estimate the

sentiments

−Does not consider the relationships between the topics

and the sentiments

Twitter
Uthirapathy and Sandanam [11]

 + Uses the BERT to classify the sentiments

−Does not consider the relationships between the topics

and the sentiments

Twitter

Other TSA approaches perform TM and SA operations simultaneously [4, 2131]. These approaches usually customize the LDA algorithm and add a layer for SA. Most of these approaches are unsupervised and do not provide good accuracy when there is insufficient training data or the length of training documents is short [4]. Lin et al. [21] developed the Joint Sentiment/Topic Model (JST) to simultaneously detect the topic and sentiment. JST is an entirely unsupervised method created by extending the LDA algorithm. In this approach, a sentiment layer is added between the document and topic layers in LDA. Topic Sentiment joint model with Word Embedding (TSWE) [24] is an unsupervised approach proposed by Fu et al. through extending the JST. In this method, the Dirichlet polynomial component of JST has been replaced with two other components, Sentiment-topic-to-word Dirichlet and word embedding. Topic Sentiment Joint Model with Word Embedded Dependency (TSJM-WED) [25] extends the TSWE to consider the dependency among words. It uses the LSTM and GRU networks for training purposes and keeps the words dependent. Fu et al. [4] developed the Weakly Supervised Topic Sentiment joint model with Word Embedding (WS-TSWE) using the HowNet dictionary and word embedding technique.

Several TSA joint models benefit from multimodal information extracted from different sources. Multimodal Joint Sentiment Topic (MJST) [26] model is a weakly supervised method that adds a layer known as a sentiment to LDA and uses emoticons and another parameter called the Personality Factor to detect the sentiment of messages. For this purpose, MJST employs the polarity of microbloggers’ messages to estimate the personality factor. Osmani et al. [22] developed three other methods by extending LDA with parameters related to reviews, such as date, author, helpfulness, sentiment, and subtopic: Date-Sentiment LDA (DSLDA), Author-Date-Sentiment LDA (ADSLDA), and Pack-Author-Date-Sentiment LDA (PADSLDA). Helpfulness refers to the number of readers who find the review useful. JST-RR [23] is a unified generative model where the topic modeling is constructed based on review texts, and the sentiment prediction is obtained by combining review texts and overall ratings.

There are also several approaches that utilize machine learning for joint model approaches of TSA. Dong et al. [29] developed the Unsupervised Topic Sentiment Joint Model (UTSJ) to distinguish fake product reviews from real ones and analyze the topic sentiment of reviews based on the LDA. UTSJ adds the sentiment layer to the LDA algorithm and takes advantage of SVM and Random Forest (RF) classifiers to detect real and fake reviews. Liu et al. [28] proposed Dynamic Topic-based Sentiment Analysis (DTSA) that simultaneously extracts topics and sentiments from users’ news and comments and also models the gradual evolution of topics over time. Nimala and Jebkumar [30] applied the Sentiment Topic Model (STM) to analyze students’ sentiments about professors and implemented it on the opinions of 4,000 students at the beginning and the end of the course and observed better accuracy relative to other approaches for SA. In STM, a sentiment dictionary is created simultaneously with SA. Pergola et al. [31] introduced the Topic-Dependent Attention Model (TDAM) for topic SA. TDAM assumes a general topic for the whole corpus and then uses an attention mechanism to find the topics and their relevant sentiments. Bidirectional GRU was employed to implement TDAM.

Table 2 shows the list of these methods, including employed datasets and the pros/cons of each. As explained above, these approaches do not provide good accuracy when there is insufficient training data or the length of training documents is short [4].

Table 2.

TSA: Topic Sentiment Joint Models

References Pros and cons Dataset
JST [21]

 + Presents the first Joint Sentiment/Topic model

−Needs to first receive a list of positive and negative

words as seeds

Movie Review Dataset
TSWE [24]

 + Utilizes word embeddings to consider the dependency

between words

−Requires external sources such as English and Chinese

Wikipedia

Amazon Product Review and Movie Review Datasets
MJST [26]

 + Considers the emoticons and micro bloggers'

personalities to extract sentiments

−Low accuracy

SinaWeibo
DTSA [28]

 + Considers the dynamic nature of news and the

comments of users

−Requires another tool for tagging data

News from Guardian and Twitter
WS-TSWE [4]

 + Utilizes word embeddings and HowNet lexicon

together to improve the topic and sentiment detection

−Requires external sources such as English and Chinese

Wikipedia

Amazon Product Review and Movie Review Datasets
UTSJ [29]

 + Detects deceptive comments

−Does not employ word embedding techniques

Yelp Dataset
TSJM-WED [25]

 + Utilizes word embeddings to consider the semantic

dependency among words

 + Uses GRU and LSTM to classify sentiments

−Requires external sources such as English and Chinese

Wikipedia

Computer, hotel, book, and Movie Review Datasets
STM [30]

 + Uses the LDA method to classify the sentiments

−Needs a tool to first label the data with different

sentiments

Student Comments
TDAM [31]

 + Uses multi-task learning to predict the sentiments and

domain category

 + Uses BiLSTM and BiGRU to classify sentiments

Amazon and Yelp
Osmani et al. [22]

 + Extends the LDA by adding extra parameters like the

date, the author, the helpfulness, etc

−Requires external dictionaries

Amazon Reviews and sports magazines
JST-RR [23]

 + Joint Modeling of Ratings and Reviews

−Needs to first receive a list of positive and negative

words as seeds

Amazon Reviews (HP, Lenovo, and Dell laptops)

Gap analysis

As discussed earlier in Sect. 2.5, conventional approaches of TSA perform TM and SA sequentially. These methods utilize the irrelevant datasets to train the classifier or employ the widely used lexicons to classify the sentiments [5] –[18]. On the contrary, other TSA approaches perform TM and SA operations simultaneously. These approaches usually customize the LDA algorithm and add a layer for SA. They borrow a list of seeds along with their sentiments from domain-independent lexicons [4, 2131]. In both approaches, the polarity of domain-specific terms is not recognized correctly because either the topics and lexicons are not in the same domain as topics or the ML models were trained by the data which are not in the same domain as topics. As a result, the semantic relationships between the sentiments and domains are ignored. Additionally, when there is insufficient training data or when the length of training documents is short, their accuracy decreases remarkably [4].

Some methods, such as TSWE [24], WS-TSWE [4], TSJM-WED [25], and TOUR [14] utilized word embeddings to capture the dependency among words. They use pre-trained Word2vec and Glove vectors trained on Wikipedia and other resources. Since these word embeddings are not in the same context as the topics, they do not capture the semantic relationships between the training data and the topics. To the best of our knowledge, none of the TSA approaches employs semantic similarity to capture the semantic relationships between the topics and the training dataset.

Moreover, only TDAM [31], TSJM-WED [25], and Jelodar et al.[5] have employed RNNs such as LSTM, GRU, BiLSTM, and BiGRU, and none of them utilized CNNs or their combination with RNNs to improve the accuracy and effectiveness of TSA. None of the TSA studies have used metaheuristic algorithms to fine-tune the hyperparameters of the DNN model.

ETSANet

ETSANet is a novel TSA approach that extracts the semantic similarity between the topics and the sentiments utilizing Semantic Topic Vector, word and document embedding, and DNNs. In the real-world application of TSA, a number of topics are detected from a huge number of unlabeled texts, such as tweets, comments, and reviews. Later on, a classifier, previously trained on an irrelevant labeled dataset, is used to classify the texts. Hence, the semantic relationship between the topics and sentiments is ignored. In comparison with the real-world application of TSA, the existing methods of TSA use the same dataset for topic detection and sentiment classification in supervised and unsupervised ways. In supervised TSA methods, the topic detection dataset needs to have labels, but in real-world applications of TSA, the sentiments of topics are hidden in a huge number of unlabeled texts. In unsupervised TSA methods, a list of seeds and their sentiments is fed into the algorithm, and usually, these seeds are not in the same context as the discovered topics. To this end, ETSANet benefits from two datasets: topic detection and sentiment classification datasets, as shown in Fig. 1. ETSANet extracts the hidden topics from the topic detection dataset, and uses the semantic relationships to extract the samples which are in the same context with the topics from the sentiment classification dataset. These semantically topic-related data are utilized for training the CNN–GRU model. Therefore, the sentiment classification dataset needs labels and should cover the data on different topics.

Fig. 1.

Fig. 1

Workflow of ETSANet

As shown in Fig. 1, ETSANet has three main steps: (1) Topic Discovery, (2) STRDF, and (3) Sentiment Classification. After preprocessing the datasets, we use the coherence value to find the optimal number of topics. Then, the topics hidden in the topic detection dataset are extracted using the LDA algorithm. In the next step, STRDF, a Doc2Vec model is created from the sentiment classification dataset, the document vectors of topic words are extracted from Doc2Vec model, and Semantic Topic Vector of each topic is created. Later on, using the Semantic Topic Vectors, the Doc2Vec model, and the Cosine similarity measure, the documents in the same domain as the topics are extracted from the sentiment classification dataset. Then, these documents are fed into the CNN–GRU model to be classified as positive or negative documents. Also, in the sentiment classification step, we have employed the GWO–WOA meta-heuristic algorithm to identify the optimum values of hyperparameters of the CNN–GRU model. Each of the above steps will be discussed in detail in the following subsections.

Topic discovery

To find the optimal number of topics, we use the topic coherence measure proposed in [35], explained in Sect. 2.1. As described in Pseudocode 1, the LDA method is performed on a topic detection dataset for the number of topics from 5 to 120. Then, using the coherence value, we select the number of topics having the highest coherence value. Section 4.4, Fig. 3, shows the coherence values corresponding to various numbers of topics.graphic file with name 11227_2023_5423_Figa_HTML.jpg

Fig. 3.

Fig. 3

Topic coherence diagram

STRDF

STRDF method aims at finding the documents which are in the same domain as topics, called Semantically Topic-related documents that will be used to train the CNN–GRU model in the next step. To do this, STRDF includes three steps: (1) Doc2Vec model, (2) Semantic Topic Vector, and (3) Semantically Topic-Related Documents. As explained in Sect. 3.2.1 in detail, we create Doc2vec model of classification dataset to convert the topic words into document vectors. We create three Doc2vec models as proposed in [38], combine them to increase accuracy, and select the one with the lowest error. In the next step, we extract the document vectors of all words composing the topics individually. These document vectors are employed to create Semantic Topic Vector corresponding each topic. Semantic Topic Vector is the average of all document vectors of words composing a topic, explained in Sect. 3.2.2. Then, in the next step, these Semantic Topic Vectors are utilized to find the semantically topic-related documents. As described in Section, 3.2.3, we use the Cosine similarity measure to find the documents which are in the same domain as Semantic Topic Vector. Later on, we use these semantically topic-related documents to train the CNN–GRU model. The following subsections describe these steps in more detail.

Doc2Vec model

In this step, the documents of the sentiment classification dataset are converted into feature vectors or document embeddings. For this purpose, the Doc2Vec model of the sentiment classification dataset is constructed. As explained in Sect. 2.2, there are two forms of the Doc2Vec model: PV-DM and PV-DBOW. The first one, analogous to the continuous bag of words, is more complex but shows higher performance. Also, the PV-DM method either concatenates or averages all word embeddings of a document to calculate document embeddings. The second one, similar to skip-gram, is simpler and usually leads to a higher error rate. Also, in the case of PV-DM, the Doc2Vec method either concatenate or averages all word embeddings of a document to calculate document embeddings. To create a Doc2Vec model leading to the lowest error rate, we developed three Doc2Vec Models: D2V-DM-Concat, D2V-DM-Average, and D2V-DBOW, and assessed them using a logistic regression model on the sentiment classification dataset, as suggested in [38].

Subsequently, Lee and Mikolov suggested that the concatenation of document embeddings created by distributed memory (PV-DM) and distributed bag of words models (PV-DBOW) improves the performance of the Doc2Vec model [38]. To this end, we separately concatenated two distributed models (D2V-DM-Concat and D2V-DM-Average) with D2V-DBOW and created two concatenated models called D2V-BOW-Concat and D2V-BOW-Average. Regarding the evaluation results will be discussed in Sect. 4.4, D2V-BOW-Concat has the lowest error rate.

Semantic topic vector

Measuring the similarity between texts plays an essential role in information retrieval, document clustering, text summarization, etc. Lexical techniques determine similarity using character sequences and fall into two categories: character-based and statement-based techniques. The most common statement-based method is the cosine similarity, by which the cosine of an angle between two vectors is calculated. The smaller the angle between two vectors, the more similar they are [50]. The cosine of the angle between vectors v1 and v2, known as similarity score, is calculated by Eq. (9):

SIMv1,v2=i=1nv1iv2iv1i2×v2i2 9

We take advantage of the cosine similarity to measure the similarity between the topic and texts. To this end, we introduce Semantic Topic Vector as a new concept presenting all semantic aspects of a topic.

Assume that we have a collection of k topics denoted byT=t1,t2,,tk; each topic ti is a collection of top n words denoted by ti=wi1,wi2,,win where n is the number of top words of each topic. Moreover, let DVi be the collection of n document vectors corresponding to topic ti, one document vector for every word, denoted by DVi=dvi1,dvi2,,dvin where dvij is a document vector corresponding wij extracted from Doc2Vec model created in Sect. 3.3.1, one dvij for everywij. The Semantic Topic Vector, denoted by STV, is a feature vector corresponding to a topic. Also, assume that STVi is the Semantic Topic Vector corresponding to topicti. Finally, we calculate STVi as follows:

STVi=j=1ndvijn 10

As Eq. (10) shows, STVi is the arithmetic mean of the document vectors of all top words belonging to the topic ti. The basic idea of the Semantic Topic Vector comes from the fact that the top n words of a topic can be of different contexts, and the vector representing these words must cover all of them. For instance, let movie,long,ghost,nice,woman be the top 5 words of a topic. These words come from different contexts; the mean vector of them must cover and represent their features, so we calculate STVi for each topic ti.

Semantically topic-related documents

Subsequently, taking advantage of the Cosine similarity, shown in Eq. (9) and the Doc2Vec model created in Sect. 3.2.1, for each topicti, the most similar documents to STVi, shown in Eq. (10), is extracted from the training dataset, the documents which have the smallest angle with STVi and consequently are in the same context with the topicti. In addition, to ensure all aspects of the topic are covered, we assume that all words of a topic construct a document and extract the documents having the smallest angle with this document. As a result, we have a set of similar documents for each topic set. Section 4.4, Table 8, shows the specifications of these documents for each topic set. Pseudocode 2 presents the process of identifying topics and finding similar documents to each topic, the variable k represents the number of topics, and the variable n denotes the number of top words for each topic.graphic file with name 11227_2023_5423_Figb_HTML.jpg

Table 8.

Comparison of STRDF with Cosine and Soft Cosine

Method #Topics
(K)
Similarity score
Positive Negative
Mean Std. Mean Std.
Cosine 40 0.903 0.16 0.944 0.11
Soft Cosine 0.714 0.22 0.726 0.2
STRDF 0.943 0.05 0.965 0.04
Cosine 50 0.925 0.11 0.951 0.07
Soft Cosine 0.707 0.175 0.727 0.19
STRDF 0.947 0.05 0.967 0.03
Cosine 60 0.893 0.06 0.955 0.02
Soft Cosine 0.710 0.17 0.725 0.19
STRDF 0.951 0.04 0.966 0.03
Cosine 65 0.924 0.05 0.945 0.03
Soft Cosine 0.707 0.18 0.728 0.20
STRDF 0.948 0.05 0.968 0.03
Cosine 70 0.904 0.05 0.944 0.04
Soft Cosine 0.714 0.17 0.734 0.20
STRDF 0.949 0.04 0.967 0.03

The best results are in bold

Sentiment classification

The sentiment classification part of ETSANet includes (1) The CNN–GRU model and (2) Hyperparameter Tuning.

CNN–GRU model

ETSANet employs a combination of CNN and GRU for sentiment classification. It has been proved that CNNs can be used to improve text classification accuracy [41]. Indeed, CNNs have a strong capacity for extracting local and deep features from text using convolution layers. On the other hand, RNNs are able to learn the long-term dependencies, so they are appropriate for modeling sequential data such as text since the sentences can be considered a sequence of words from left to right. The LSTM and GRU networks are two types of RNNs, but GRU networks provide less computational complexity and simpler architecture than LSTM [43]. Considering the facts mentioned above and inspired by the results of [41] proving that the CNN and GRU achieved higher accuracy in text classification tasks, ETSANet benefits from both CNN and GRU. As shown in Fig. 2, the proposed DNN architecture includes the embedding, convolution, max-pooling, GRU, and fully connected layers:

  1. Embedding layer: This layer receives the semantically topic-related data, as explained in Sect. 3.3.3, in the form of word embeddings. Assume v is vocabulary size of the corpus, and d is the size of a word embedding (dimension size), then an embedding matrix EMRdv containing all words of the vocabulary is created. Subsequently, a sentence and its embedding can be represented as Eqs. (11) and (12), respectively:
    Sentence=w1,w2,,wl 11
    Sentence_Embedding=we1,we2,,wel,Sentence_EmbeddingRdl 12

    Where wi indicates the i-th word of the sentence, l is the length of the sentence and the column wei denotes the word embedding of wi, wei=EMwi,weiRd.

  2. Convolution layer: This layer extracts the local features. Suppose KRdw is a kernel of size w, which is applied to each window of size, a bias term is added to the result of the convolutional operation, and a feature map fmRl-w+1 is created as follows:
    FM=fm1,fm2,,fml-w+1,fmRl-w+1 13
    Then, the following equation shows the i-th element of the feature map:
    fmi=σEM,i:i+wK+b 14
    where σ is a non-linear activation function like ReLu or tanh.
  3. Pooling layer: In the next step, the feature maps are fed into the pooling layer to find the essential features and reduce the dimensions. The pooling layer, widely used after the CNN layers, performs dimension reduction and consequently decreases the computation time. ETSANet uses the Max-pooling layer with the pool size 2, which converts the feature map of size l-w+1 to l-w+12. The output of the pooling layer is:
    P=p1,p2,,pl-w+12,pRl-w+12 15
    where pi is calculated as follows:
    pi=maxfm2i-1,fm2i 16
  4. GRU layer: As explained in Sect. 2.3, GRU networks are capable of capturing long-term dependencies and have less computational complexity and simpler architecture than LSTM networks. As a result, in this step, the GRU receives the features obtained by the pooling layer, =p1,p2,,pl-w+12, to find the long-term dependencies. The output of GRU isgRn, the encoding of a complete sentence.
    g=GRUp1,p2,,pl-w+12,gRn 17
  5. Fully connected layer: The output of the GRU layer is sent to a fully connected layer that uses the sigmoid activation function. Passing the feature vectors to the sigmoid function yields a probability score over sentiment classes. The sigmoid function is calculated as follows:
    sigmoidg=11+e-g 18
    where g denotes the advanced feature vector created by the GRU. Additionally, we use binary cross-entropy between actual values and predicted classes for SA as follows:
    Loss=-1Ni=1Nyi.logpyi+1-yi.log(1-pyi) 19
    where y denotes a class, p(y) indicates the probability of it, and N is the number of outputs.
Fig. 2.

Fig. 2

CNN–GRU architecture of ETSANet

Pseudocode 3 shows the process of sentiment classification.graphic file with name 11227_2023_5423_Figc_HTML.jpg

Hyperparameter tuning

ETSANet utilizes hybrid serial GWO–WOA [32] to fine-tune the number of filters and kernel size of the convolutional layer, the pool size of the Max-pooling layer, and the number of units of the GRU layer. As described in Sect. 2.3, the hybrid serial GWO–WOA updates the solution in two phases in each iteration: first, a solution is updated using the operators of GWO. Second, the search process is extended to other promising locations using the operators of the WOA. The fitness function of the GWO–WOA is the loss function of the CNN–GRU model, which is binary cross-entropy (Eq. 19), since the goal of the optimization process is to minimize the loss of the model. Pseudocode 4 describes serial GWO–WOA used in ETSANet briefly.graphic file with name 11227_2023_5423_Figd_HTML.jpg

Evaluation of ETSANet

This Section describes the datasets, metrics, experimental settings, hyperparameter tuning, and the complete assessment of ETSANet in different phases. Since ETSANet includes two major steps, STRDF (Semantically Topic-Related Documents Finder) and sentiment classification using CNN–GRU networks, we compare the results of STRDF method with other text similarity approaches. Then, the sentiment classification part is compared with different DNNs. Ultimately, ETSANet is compared with existing TSA approaches, mainly introduced in Sect. 2.4, and the computational complexity of ETSANet is presented at the end of this chapter.

Data description

ETSANet employs two different datasets, a topic detection dataset for topic discovery and a sentiment classification dataset for training the CNN–GRU network. We have used the second version of the Pang and Lee movie review dataset [51] as the topic detection dataset, which is commonly used in TSA studies. We have combined two datasets with different topics as the sentiment classification dataset to show that ETSANet can find semantically similar training data in the same context as topics. Table 3 presents the description of each dataset.

Table 3.

Description of the datasets

Dataset #negSamples #posSamples #toalSamples
Topic detection dataset
Pang and Lee movie review dataset [51] 1000 1000 2000
Sentiment classification dataset
1 IMDB film review [52] 25,000 25,000 50,000
2 Sentiment140 [53] 800,000 800,000 1,600,000
Total 825,000 825,000 1,650,000

A preprocessing module in ETSANet analyzes these datasets. Text preprocessing plays an important role in SA since user-generated texts are prone to errors that can cause the problem of bias and fairness. As a result, in this step, punctuation, Unicode, special, and non-alphabetical characters are removed. Additionally, the stop words such as “he”, “she”, and “to” are eliminated, and all letters are converted into lowercase. Finally, stemming is performed.

Metrics

In terms of evaluation metrics, Accuracy, Precision, Recall, and F1-score are the commonly adopted metrics that have been used for performance evaluation. Accuracy is obtained using Eq. (20) and indicates the number of correct choices relative to all choices:

Accuracy=TP+TNTP+TN+FP+FN 20

In the above Equation:

  • TP The number of samples that have been detected positive by the model and are indeed positive.

  • TN The number of samples that have been detected negative by the model and are indeed negative.

  • FP The number of samples that have been detected positive by the model but are, in fact, negative.

  • FN The number of samples that have been detected negative by the model but are, in fact, positive.

The next metric is precision which estimates the number of labels in each class that are correctly predicted:

Precision=TPTP+FP 21

Recall is another metric that shows the weighted average of correct labels that are correctly predicted for each class:

Recall=TPTP+FN 22

The harmonic mean of the precision and recall metrics results in F1-score, with closer values to 1 indicating that the classification operation is performed correctly:

F1-score=2PrecisionRecallPrecision+Recall 23

Experimental settings

In terms of hardware, ETSANet runs on the Google Colab2 platform equipped with a K80 GPU and 12 GB of RAM.

In the preprocessing step, the NLTK tool,3 a well-known Python library for NLP, and the PorterStemmer4 were employed for stemming.

Since DNNs employ random initialization and give different results at each run, all algorithms are performed 10 times, and the average results are reported. Python programming language version 3.8 is utilized to implement ETSANet. Keras [54] library has also been employed to develop different kinds of DNNs. Additionally, we have used the NiaPy [55] and sklearn-nature-inspired-algorithms [56] libraries, well-known Python frameworks, to implement the nature-inspired algorithms.

We evaluate different parameter configurations employed in ETSANet for LDA topic discovery and CNN–GRU hyperparameter settings and fine-tune the models to achieve the optimum results.

To configure the LDA parameters, we follow the settings used in [21], which is a common setting in the literature. We set β to 0.1, and α to 50k. Here, k is the number of topics.

The set of documents found by STRDF (for topic numbers equal to 40, 50, 60, 65, and 70) are split into 8:2 for training and testing separately. Moreover, we use the early stopping mechanism to prevent overfitting and set the maximum number of epochs to 100, the observation metric to validation accuracy, and the patience to 10. It should be mentioned that the rest of the hyperparameters of the CNN–GRU model are listed in Table 4.

Table 4.

Hyperparameters of the CNN–GRU model

Hyperparameter Value
Trainable False
Optimizer Adam
Loss Binary Cross-Entropy
Learning Rate 0.01
Stride 1
Activation (Convolution Layer) ReLu
Activation (GRU Layer) ReLu
Activation (Dense Layer) Sigmoid
Epochs 100
Batch size 32
Test Split 0.2

We utilize the serial GWO–WOA [32] to fine-tune the hyperparameters of the CNN–GRU model, as explained in Sects. 2.3 and 3.4.2, and the results are compared with the GWO, WOA, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and FireFly (FA) algorithms since these algorithms are widely used for hyperparameter tuning. We use the loss function of the CNN–GRU model, which is binary cross-entropy, as the fitness function. As shown in Table 5, we configure the hyperparameters of GWO–WOA and other algorithms with the settings presented in [32] and NiaPy [55], which are common settings in the literature, respectively.

Table 5.

Configuration settings of the metaheuristic approaches

Algorithm Hyperparameter Value
GWO–WOA Population size 20
Number of runs 40
Iteration 30
GWO a Linearly decreased from 2 to 0
r1 A random vector in [0,1]
r2 A random vector in [0,1]
WOA a Linearly decreased from 2 to 0
r A random vector in [0,1]
GA Crossover Uniform crossover
Mutation Uniform mutation
Crossover rate 0.25
Mutation rate 0.25
PSO Min velocity − 1.5
Max velocity 1.5
C1 2
C2 2
FA β0 1
0.01
1

STRDF evaluation

In this Section, first, we find the number of topics as explained in Sect. 3.1. Then, we evaluate different developed Doc2Vec models to select the one with the least error for creating semantic topic vector and finding similar documents, as explained in Sect. 3.2.1. Finally, we compare the documents found by the STRDF method in two ways: 1) comparing with the documents found by the Cosine and Soft Cosine measures, and 2) comparing with other state-of-the-art document embedding methods.

To find the optimal number of topics, LDA has been performed on the Pang and Lee movie review dataset (topic detection dataset) for the number of topics between 5 and 120, and for each topic, 15 words with the highest frequency in the topic have been discovered. Figure 3 outlines the coherence value for the number of topics from 5 to 120. The coherence value keeps increasing with the number of topics, but it decreases after 60. Therefore, we have selected the number of topics as 40, 50, 60, 65, and 70 since they have achieved the highest coherence values.

As described in Sect. 3.2.1, we made three Doc2Vec Models: D2V-DM-Concat, D2V-DM-Average, and D2V-DBOW. To assess the results of these models, we made a logistic regression model on the sentiment classification dataset. Its input is document embeddings, and its target is the sentiment labels. Then, we tuned these models based on the vector size and learning rate. As shown in Figs. 4 and 5, both D2V-DM-Concat and D2V-DM-Average models achieved the lowest error rate with a learning rate equal to 0.25, and the D2V-DBOW with a learning rate equal to 0.5. The D2V-DBOW and D2V-DM-Concat models obtained the lowest error rate with a vector size equal to 200 and the D2V-DM-Average with a vector size equal to 250.

Fig. 4.

Fig. 4

Hyperparameter tuning based on learning rate

Fig. 5.

Fig. 5

Hyperparameter tuning based on vector size

Subsequently, as suggested in [38] and explained in Sect. 3.2.1, we concatenated two distributed models (D2V-DM-Concat and D2V-DM-Average) with D2V-DBOW separately and created two concatenated models called D2V-BOW-Concat and D2V-BOW-Average. As can be shown in Table 6, the error rates of the two concatenated models decreased remarkably, and the D2V-BOW-Concat achieved the lowest error. As a result, we used this model in the following steps.

Table 6.

Error rate of different models

Model name Error rate
D2V-DM-Concat 0.221
D2V-DM-Average 0.240
D2V-DBOW 0.248
D2V-BOW-Concat 0.102
D2V-BOW-Average 0.109

The best results are in bold

Now, we use D2V-BOW-Concat model and topic numbers (40, 50, 60, 65, and 70) from Fig. 3. We find semantically topic-related documents as explained in Sect. 3.2.3, and have a set of similar documents for each topic set. To be comparable with the existing methods which use the Pang Lee dataset, including 1000 positive and 1000 negative samples, we have selected 1000 positive and 1000 negative samples with the highest semantic relatedness, which is similarity score or cosine similarity, to the topics. Then, a dataset containing 2000 semantically topic-related documents is created for each set of topics. The specification of these datasets, including the mean and standard deviation (std) of similarity scores, and the length of positive and negative samples for each topic set, are listed in Table 7.

Table 7.

Information of semantically topic-related documents corresponding to each set of topics

#Topics
(K)
Positive samples Negative samples
Mean Std. Mean of length Mean Std. Mean of length
40 0.941 0.05 4.324 0.964 0.04 5.823
50 0.946 0.05 4.321 0.963 0.03 5.854
60 0.950 0.04 3.973 0.965 0.03 5.859
65 0.947 0.05 4.233 0.967 0.03 5.318
70 0.948 0.04 4.311 0.966 0.03 5.825

The best results are in bold

As listed in Table 7, in all cases, the mean of similarity scores is greater than 0.94, and the std of similarity scores are small, showing that the results are semantically related to the topics. In addition, in positive samples, k equal to 60 and 70 have the highest mean and smallest std, while in negative samples, the k equals 65 has the highest mean and smallest std. Hence, they are expected to achieve more accurate results in sentiment classification. It is worth mentioning that the negative samples have a higher mean and smaller std of similarity scores compared to positive samples that show the attitudes regarding these topics are more negative. Moreover, the mean length of negative samples is bigger than positive samples indicating that the users prefer to write more in negative cases.

To compare STRDF with other methods, firstly, we compared STRDF method with Cosine [50] and Soft Cosine [57] similarity measures. Unlike the traditional Cosine similarity that considers feature vectors, the Soft Cosine similarity measure considers the similarity of feature pairs. To compare STRDF with Cosine and Soft Cosine similarity measures, we found the top 2000 most similar documents, 1000 positive and 1000 negative samples, by these measures. Table 8 shows the similarity scores of found similar documents. The datasets with the higher mean and the lower std of similarity scores are the most topic-related datasets. For topic numbers equal to 50 and 60, the obtained documents by Cosine measure are close to STRDF in negative samples. For the positive samples, the best result is obtained when the topic number is equal to 60. In this case, STRDF obtained 0.951 and 0.04 for the mean and std of similarity scores. For negative samples, the highest mean similarity score is achieved by STRDF when the topic number is equal to 65, the mean similarity score is 0.968, and the std is 0.03. In all cases, the Soft Cosine got the lowest similarity scores.

In addition, we compared STRDF to other state-of-the-art sentence embedding methods, such as Sentence-Bert [58], ALBERT [59], and RoBERTa [60], as listed in Table 9. To this end, we used these embedding techniques to find the most similar documents to the topics. STRDF outperforms other sentence embedding techniques for a different number of topics. Sentence-Bert and ALBERT achieved approximately similar results, but RoBERTa obtained the closest results to STRDF.

Table 9.

Comparison of STRDF with Sentence-Bert, ALBERT, and RoBERTa

Method #Topics
(K)
Positive Negative
Mean Std. Mean Std.
Sentence-Bert 40 0.678 0.14 0.673 0.16
ALBERT 0.678 0.14 0.673 0.16
RoBERTa 0.814 0.12 0.792 0.13
STRDF 0.943 0.05 0.965 0.04
Sentence-Bert 50 0.671 0.14 0.6640 0.14
ALBERT 0.671 0.14 0.664 0.14
RoBERTa 0.802 0.11 0.802 0.14
STRDF 0.947 0.05 0.967 0.03
Sentence-Bert 60 0.678 0.14 0.674 0.15
ALBERT 0.678 0.14 0.674 0.15
RoBERTa 0.809 0.10 0.794 0.11
STRDF 0.951 0.04 0.966 0.03
Sentence-Bert 65 0.679 0.15 0.6725 0.14
ALBERT 0.679 0.14 0.672 0.15
RoBERTa 0.810 0.13 0.796 0.11
STRDF 0.948 0.05 0.968 0.03
Sentence-Bert 70 0.678 0.15 0.669 0.16
ALBERT 0.678 0.1 0.679 0.1
RoBERTa 0.816 0.10 0.803 0.13
STRDF 0.949 0.04 0.967 0.03

The best results are in bold

Sentiment classification evaluation

As explained in Sect. 3.3.2, we use GWO–WOA algorithm to fine-tune the hyperparameters of classifier. As can be seen in Table 7, Sect. 4.4, STRDF found semantically related documents to topic numbers with the highest coherence value (40, 50, 60, 65, and 70). We fine-tune the hyperparameters of the CNN–GRU model for each set of documents relevant to the topic numbers individually, which includes the number of filters (candidate values: 32 and 64), kernel size (candidate values: 3, 4, 5, 6, 7, and 8), pool size (candidate values: 2, 3, and 4), and the number of GRU units (candidate values: 10, 15, 20, and 25). As listed in Table 10, GWO–WOA achieved the highest accuracy for various values of K in terms of the above hyperparameters.

Table 10.

The result of hyperparameter tuning

Algorithms Hyperparameters #Topics
K = 40 K = 50 K = 60 K = 65 K = 70
GWO–WOA The number of filters 32 32 32 32 32
Kernel size 5 5 5 8 8
Pool size 2 2 2 2 2
Number of GRU units 20 20 20 20 20
Accuracy 0.842 0.851 0.843 0.863 0.867
GWO The number of filters 32 32 32 32 32
Kernel size 4 4 4 3 3
Pool size 2 2 2 2 2
Number of GRU units 15 15 15 15 15
Accuracy 0.833 0.812 0.834 0.859 0.854
WOA The number of filters 32 32 32 32 32
Kernel size 5 5 5 6 6
Pool size 2 2 2 2 2
Number of GRU units 15 15 15 15 15
Accuracy 0.818 0.826 0.841 0.852 0.846
GA The number of filters 32 32 32 32 32
Kernel size 3 3 4 4 4
Pool size 2 2 2 2 2
Number of GRU 20 20 20 20 20
Accuracy 0.801 0.827 0.836 0.847 0.852
PSO The number of filters 32 32 32 32 32
Kernel size 7 7 7 7 7
Pool size 2 2 2 2 2
Number of GRU units 25 25 25 25 25
Accuracy 0.795 0.784 0.819 0.826 0.814
FA The number of filters 32 32 32 32 32
Kernel size 4 4 4 4 4
Pool size 2 2 2 2 2
Number of GRU 10 10 10 10 10
Accuracy 0.793 0.789 0.809 0.828 0.832

The best results are in bold

Now, the classification part of ETSANet is compared with other DNNs, such as LSTM, BiLSTM, GRU, BiGRU, CNNs, and different combinations of them. As explained in Sect. 3.1, Topic Discovery, the optimal topic numbers are 40, 50, 60, 65, and 70. Therefore, we fed documents extracted by STRDF, relevant to these topic numbers, to the different networks and compared the results based on the accuracy, precision, recall, and F-Score.

As can be seen in Table 11, the last four methods (CNN-BIGRU, CNN-LSTM, CNN-BILSTM, and ETSANet), combining CNNs and RNNs, achieved better results. CNNs can extract the local and deep features from text using convolution layers, so they can improve text classification accuracy [40]. On the other hand, RNNs can learn long-term dependencies which makes them suitable for learning sequential data such as time-series and text. LSTM, BILSTM, GRU, and BIGRU are different kinds of RNNs that process the text as a sequence of words. CNN-LSTM and CNN–GRU performed better than CNN-BILSTM and CNN-BIGRU. Additionally, CNN–GRU achieved higher accuracy than CNN-LSTM since GRU networks are similar to LSTMs but use simpler architecture and have less computational complexity [42]. Considering these facts and the results of [40], proving the CNN–GRU achieved higher accuracy than other models in text classification, we reach the conclusion that combining CNN and GRUs can improve the accuracy. The DNN architecture employed in ETSANet performs better for topic numbers (40, 50, 60, 65, and 70). When the number of topics is equal to 70, ETSANet obtains the highest accuracy (0.867). Only when the number of topics equals 50 the CNN-LSTM obtained a higher accuracy than CNN–GRU. Like the GRU model, the LSTM model is a kind of RNN that uses a memory cell to store the previous state.

Table 11.

Comparison of ETSANet's classification based on CNN–GRU with baseline classifiers

Model #Topics
(K)
Accuracy Positive Negative
Precision Recall F-Score Precision Recall F-Score
CNN 40 0.840 0.881 0.812 0.845 0.831 0.823 0.827
50 0.821 0.813 0.762 0.787 0.723 0.724 0.723
60 0.811 0.725 0.815 0.767 0.656 0.788 0.716
65 0.823 0.726 0.845 0.781 0.658 0.752 0.702
70 0.814 0.799 0.723 0.759 0.652 0.524 0.581
GRU 40 0.811 0.659 0.563 0.607 0.564 0.523 0.543
50 0.822 0.624 0.598 0.611 0.810 0.802 0.806
60 0.820 0.715 0.625 0.667 0.651 0.824 0.727
65 0.834 0.727 0.678 0.702 0.756 0.616 0.679
70 0.833 0.765 0.642 0.698 0.814 0.794 0.804
BIGRU 40 0.763 0.654 0.612 0.632 0.725 0.765 0.744
50 0.733 0.624 0.819 0.708 0.762 0.745 0.753
60 0.741 0.563 0.823 0.669 0.723 0.722 0.722
65 0.742 0.619 0.594 0.606 0.618 0.716 0.663
70 0.713 0.584 0.514 0.547 0.614 0.662 0.637
LSTM 40 0.772 0.715 0.614 0.661 0.653 0.689 0.671
50 0.762 0.765 0.587 0.664 0.681 0.675 0.678
60 0.752 0.699 0.823 0.756 0.593 0.618 0.605
65 0.742 0.792 0.541 0.643 0.695 0.612 0.651
70 0.812 0.796 0.815 0.805 0.634 0.592 0.612
BILSTM 40 0.773 0.652 0.584 0.616 0.674 0.527 0.592
50 0.754 0.726 0.662 0.693 0.671 0.624 0.647
60 0.763 0.641 0.563 0.599 0.658 0.875 0.751
65 0.732 0.634 0.634 0.634 0.856 0.654 0.741
70 0.734 0.631 0.569 0.598 0.587 0.685 0.632
CNN-BIGRU 40 0.825 0.845 0.825 0.835 0.865 0.835 0.85
50 0.830 0.846 0.895 0.87 0.825 0.853 0.839
60 0.826 0.878 0.715 0.788 0.845 0.735 0.786
65 0.828 0.826 0.743 0.782 0.826 0.741 0.781
70 0.815 0.841 0.659 0.739 0.842 0.832 0.837
CNN-LSTM 40 0.831 0.841 0.782 0.810 0.896 0.866 0.881
50 0.855 0.882 0.875 0.878 0.761 0.842 0.799
60 0.841 0.851 0.842 0.846 0.834 0.833 0.833
65 0.835 0.834 0.831 0.832 0.874 0.814 0.843
70 0.856 0.814 0.812 0.813 0.825 0.741 0.781
CNN-BILSTM 40 0.814 0.795 0.756 0.775 0.742 0.752 0.747
50 0.815 0.723 0.700 0.711 0.716 0.822 0.765
60 0.831 0.738 0.612 0.669 0.823 0.755 0.788
65 0.822 0.714 0.645 0.678 0.824 0.701 0.758
70 0.825 0.752 0.723 0.737 0.802 0.723 0.760

ETSANet

(CNN–GRU)

40 0.842 0.824 0.865 0.844 0.879 0.886 0.882
50 0.851 0.885 0.876 0.880 0.849 0.819 0.834
60 0.843 0.873 0.823 0.847 0.865 0.854 0.859
65 0.863 0.880 0.871 0.875 0.890 0.880 0.885
70 0.867 0.823 0.812 0.817 0.823 0.894 0.857

The best results are in bold

Comparison of ETSANet with existing TSA approaches

This Section compares ETSANet with other similar TSA approaches. To this end, we run ETSANet with topic numbers equal to 8, 20, 40, and 80 since the baseline methods used these topic numbers. The JST [21] is a fully unsupervised method that adds a sentiment layer to the LDA and creates a joint sentiment topic model. TSWE [24], WS-TSWE [4], and TSJM-WED [25] methods use word embeddings and are the most similar approaches to ETSANet. TSWE [24] is based on the JST and word embeddings. It has two versions, TSWE−P and TSWE + P. The latter considers the sentiment prior.

WS-TSWE [4] is a weakly supervised topic sentiment method that utilizes word embeddings and HowNet lexicon. TSJM-WED [25] has four versions. The TSJM-WED-L and TSJM-WED-G use the LSTM and GRU to classify the sentiments. TSJM-WED-L' and TSJM-WED-G' are similar to TSJM-WED-L and TSJM-WED-G, but they do not employ word embeddings. Table 12 shows the best accuracy reported by these methods on Pang and Lee Dataset. As shown in Table 12, ETSANet outperforms baselines in similar conditions.

Table 12.

Accuracy comparison of ETSANet with the existing TSA methods

Method #Topics
K = 8 K = 20 K = 40 K = 80
JST [21] 0.795 0.804 0.804 0.832
TSWE + P [24] 0.810 0.800 0.791 0.785
TSWE-P [24] 0.728 0.800 0.728 0.795
WS-TSWE [4] 0.842 0.841 0.824 0.790
TSJM-WED-L [25] 0.849 0.839 0.836 0.812
TSJM-WED-G[25] 0.842 0.835 0.835 0.805
TSJM-WED-L’[25] 0.849 0.837 0.832 0.810
TSJM-WED-G’[25] 0.840 0.834 0.829 0.802
ETSANet 0.853 0.845 0.842 0.823

The best results are in bold

ETSANet is a TSA approach that employs training data similar to the topic for TSA purposes. Existing TSA methods suffer from using training data that are not in the same context as the topics. ETSANet addresses this problem by considering the semantic relationship between the detected topics and the training data. For this purpose, it uses the Semantic Topic Vector, which covers all the semantic aspects of a topic, and the Doc2Vec model, a document embedding technique. ETSANet transforms all the detected topics and the training data into a vector space and finds the training data which are in the same context as the detected topics using the Cosine similarity measure. This feature makes it more accurate rather than existing TSA methods. Moreover, ETSANet benefits from combining two DNNs, CNN and GRU, which generates more accurate results.

Computational complexity

The complexity of the ETSANet is non-trivial and depends on the building blocks of the method, including LDA, Doc2vec, the Cosine, GWO–WOA, and the CNN–GRU classifier.

The complexity of LDA is calculated using the below Equation:

OLDA=O(N×L×K) 24

where N, L, and K denote the number of documents, length of topics, and the number of topics, respectively.

The complexity of the Doc2vec embedding method is linear since it is composed of a single-layer model. So, it is presented by O(N), where N is the number of documents.

The complexity of Cosine is O(L×K) since the Cosine is calculated K times (the number of topics), in which each document length equals L.

The complexity of the optimization part of ETSANet, which is GWO–WOA, depends on the number of agents (A) and the number of iterations (I), which can be expressed by O(I×A).

Finally, the complexity of the CNN and GRU are O(s×n×d2) and O(n×d2) where s, n, and d are kernel size, sequence length, and representation dimension, respectively [61]. So, the complexity of CNN–GRU is calculated as below:

OCNN-GRU=Os×n×d2+O(n×d2) 25

Ultimately, the complexity of the proposed method is as follows:

OETSANet=ON×L×K+ON+OL×K+O(I×A)+Os×n×d2+O(n×d2) 26

Since the variables such as I,A, and s are constant, the complexity will be:

OETSANet=ON×L×K+ON+OL×K+2×O(n×d2) 27

In real-world datasets, N is much bigger than L and K, so L and K can be omitted in Eq. (27). Indeed, the number of samples in the dataset (N) is always much higher than the number of topics (K) and length of topics (L). Therefore, the complexity of ETSANet can be calculated by the below equation:

OETSANet=ON+O(n×d2) 28

Conclusions and future work

Existing TSA approaches suffer from a lack of semantically topic-related data because different sentiment terms have various polarities in other contexts. In conclusion, the ML models trained on a specific domain cannot be employed in other domains. In addition, the domain-independent lexicons cannot find the sentiment of domain-dependent terms correctly. The researchers proposed conventional approaches and topic-sentiment joint models to address this problem. Some conventional approaches of TSA utilize the ML models that were previously trained on an irrelevant dataset. As a result, they do not provide admirable accuracy. Other conventional TSA approaches employ domain-independent lexicons to classify the sentiments. These lexicons cannot recognize the sentiments of domain-dependent terms correctly. Topic-sentiment joint models perform TM and SA simultaneously. These approaches also take a list of words along with their sentiments from domain-independent lexicons as seeds. Hence, they cannot find the polarity of domain-dependent words effectively.

This paper proposes ETSANet, a new supervised TSA approach based on semantic similarity, document embedding, and DNNs. ETSANet utilizes a document embedding technique to create dense vectors. Firstly, it discovers the hidden topics of the corpus using the LDA algorithm. In the next step, STRDF method utilizes Semantic Topic Vector, document embeddings, and the cosine similarity measure to detect the documents in the same context as topics. In the second step, the proposed DNN, a combination of CNN and GRU, is trained with these semantically topic-related documents for sentiment classification purposes.

Moreover, a hybrid metaheuristic algorithm composed of GWO and WOA has been used to fine-tune the hyperparameters of the CNN–GRU model. ETSANet has been evaluated with a few existing TSA approaches. The evaluation results reveal that it brings higher accuracy than the current methods.

For an exhaustive evaluation of the proposed method in this paper, ETSANet has been assessed in four steps. First, different Doc2Vec models have been compared in terms of error rate, in which D2V-BOW-Concat showed the least error, 0.102. Second, the STRDF has been assessed with the Cosine, Soft Cosine, Sentence-Bert, ALBERT, and RoBERTa. STRDF outperformed these methods in terms of similarity score (0.968). Third, the classification part of ETSANet, CNN–GRU deep neural network, has been evaluated in terms of accuracy, precision, recall, and f-score with other baseline classifiers. ETSANet achieved the highest accuracy (0.867). Forth, ETSANet has been compared with existing TSA methods, and the results demonstrated that ETSANet increases the accuracy of the state-of-the-art methods by 1.92%.

Considering future works, it could be effective to use the other versions of the LDA algorithm instead of the original LDA to improve the topic detection phase. Moreover, other similarity measures, like semantic similarity or character-based measures, can be applied to find semantically similar documents. Furthermore, ETSANet needs labeled data related to different topics that are hard to provide in real-world applications, so improving it to use unlabeled data would be a great idea.

Authors' contributions

AS: Concept, design, methodology, implementation, and evaluation. RR: Concept, design, verification, validation, and editing. RN: Editing and consultancy.

Funding

The authors did not receive support from any organization for the submitted work.

Data availability

All data generated or analyzed during this study are included in this published article.

Declarations

Conflict of interest

The authors declare no conflicts of interest regarding the publication of this paper.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Azam Seilsepour, Email: aza.seilsepour.eng@iauctb.ac.ir.

Reza Ravanmehr, Email: r.ravanmehr@iauctb.ac.ir.

Ramin Nassiri, Email: r_nasiri@iauctb.ac.ir.

References

  • 1.Balakrishnan V, Shi Z, Law CL, Lim R, Teh LL, Fan Y. A deep learning approach in predicting products’ sentiment ratings: a comparative analysis. J Supercomput. 2022;78(5):7206–7226. doi: 10.1007/s11227-021-04169-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Seilsepour A, Ravanmehr R, Sima HR. 2016 Olympic games on Twitter: sentiment analysis of sports fans tweets using big data framework. J Adv Comput Eng Technol. 2019;5(3):143–160. [Google Scholar]
  • 3.Wankhade M, Annavarapu CSR, Verma MK. CBVoSD: context based vectors over sentiment domain ensemble model for review classification. J Supercomput. 2022;78(5):6411–6447. doi: 10.1007/s11227-021-04132-5. [DOI] [Google Scholar]
  • 4.Fu X, Sun X, Wu H, Cui L, Huang JZ. Weakly supervised topic sentiment joint model with word embeddings. Knowl-Based Syst. 2018;147:43–54. doi: 10.1016/j.knosys.2018.02.012. [DOI] [Google Scholar]
  • 5.Jelodar H, Wang Y, Orji R, Huang S. Deep sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM recurrent neural network approach. IEEE J Biomed Heal Inform. 2020;24(10):2733–2742. doi: 10.1109/JBHI.2020.3001216. [DOI] [PubMed] [Google Scholar]
  • 6.Jelodar H, et al. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimed Tools Appl. 2019;78(11):15169–15211. doi: 10.1007/s11042-018-6894-4. [DOI] [Google Scholar]
  • 7.Ilyas SHW, Soomro ZT, Anwar A, Shahzad H, Yaqub U (2020) Analyzing brexit’s impact using sentiment analysis and topic modeling on twitter discussion. In: PervasiveHealth: Pervasive Computing Technologies for Healthcare, pp 1–6. 10.1145/3396956.3396973
  • 8.Carvache-Franco O, Carvache-Franco M, Carvache-Franco W, Iturralde K. Topic and sentiment analysis of crisis communications about the COVID-19 pandemic in Twitter’s tourism hashtags. Tour Hosp Res. 2022;23(1):44–59. doi: 10.1177/14673584221085470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yin H, Song X, Yang S, Li J. Sentiment analysis and topic modeling for COVID-19 vaccine discussions. World Wide Web. 2022;25(3):1067–1083. doi: 10.1007/s11280-022-01029-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Abiola O, Abayomi-Alli A, Tale OA, Misra S, Abayomi-Alli O. Sentiment analysis of COVID-19 tweets from selected hashtags in Nigeria using VADER and Text Blob analyser. J Electr Syst Inf Technol. 2023;10(1):5. doi: 10.1186/s43067-023-00070-9. [DOI] [Google Scholar]
  • 11.Uthirapathy SE, Sandanam D. Topic modelling and opinion analysis on climate change Twitter data using LDA And BERT model. Procedia Comput Sci. 2023;218:908–917. doi: 10.1016/J.PROCS.2023.01.071. [DOI] [Google Scholar]
  • 12.Ozyurt B, Akcayol M. A new topic modeling based approach for aspect extraction in aspect based sentiment analysis: SS-LDA. Expert Syst Appl. 2020;168:114231. doi: 10.1016/j.eswa.2020.114231. [DOI] [Google Scholar]
  • 13.Pathak AR, Pandey M, Rautaray S. Topic-level sentiment analysis of social media data using deep learning. Appl Soft Comput. 2021;108:107440. doi: 10.1016/J.ASOC.2021.107440. [DOI] [Google Scholar]
  • 14.Yang T, Gao C, Zang J, Lo D, Lyu MR (2021) TOUR: dynamic topic and sentiment analysis of user reviews for assisting app release. arXiv2103.15774 [cs] (Online)
  • 15.Garcia K, Berton L. Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Appl Soft Comput. 2021;101:107057. doi: 10.1016/j.asoc.2020.107057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kwon HJ, Ban HJ, Jun JK, Kim HS. Topic modeling and sentiment analysis of online review for airlines. Information. 2021 doi: 10.3390/info12020078. [DOI] [Google Scholar]
  • 17.Zhang S, Ly L, Mach N, Amaya C. Topic modeling and sentiment analysis of yelp restaurant reviews. Int J Inf Syst Serv Sect (IJISSS) 2022;14(1):1–16. doi: 10.4018/IJISSS.295872. [DOI] [Google Scholar]
  • 18.Qiao F, Williams J. Topic modelling and sentiment analysis of global warming tweets: evidence from big data analysis. J Organ End User Comput. 2022;34(3):1–18. doi: 10.4018/JOEUC.294901. [DOI] [Google Scholar]
  • 19.Guo Y, Wang F, Xing C, Lu X. Mining multi-brand characteristics from online reviews for competitive analysis: a brand joint model using latent Dirichlet allocation. Electron Commer Res Appl. 2022;53:101141. doi: 10.1016/j.elerap.2022.101141. [DOI] [Google Scholar]
  • 20.Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3(Jan):993–1022. [Google Scholar]
  • 21.Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp 375–384
  • 22.Osmani A, Mohasefi JB, Gharehchopogh FS (2020) Enriched Latent Dirichlet allocation for sentiment analysis. Expert Syst e12527
  • 23.Liang Q, Ranganathan S, Wang K, Deng X. JST-RR model: joint modeling of ratings and reviews in sentiment-topic prediction. Technometrics. 2023;65(1):57–69. doi: 10.1080/00401706.2022.2063187. [DOI] [Google Scholar]
  • 24.Fu X, Wu H, Cui L (2016) Topic sentiment joint model with word embeddings. In: DMNLP@ PKDD/ECML, pp 41–48
  • 25.Xie W, Fu X, Zhang X, Lu Y, Wei Y, Yang J. Topic sentiment analysis using words embeddings dependency in edge social system. Trans Emerg Telecommun Technol. 2019 doi: 10.1002/ett.3817. [DOI] [Google Scholar]
  • 26.Huang F, Zhang S, Zhang J, Yu G. Multimodal learning for topic sentiment analysis in microblogging. Neurocomputing. 2017;253:144–153. doi: 10.1016/j.neucom.2016.10.086. [DOI] [Google Scholar]
  • 27.Chen H, Cao G, Chen J, Ding J (2019) A practical framework for evaluating the quality of knowledge graph. In: Knowledge Graph and Semantic Computing: Knowledge Computing and Language Understanding, pp 111–122
  • 28.Liu P, Gulla JA, Zhang L. A joint model for analyzing topic and sentiment dynamics from large-scale online news. World Wide Web. 2018;21(4):1117–1139. doi: 10.1007/s11280-017-0474-9. [DOI] [Google Scholar]
  • 29.Yu Dong L, et al. An unsupervised topic-sentiment joint probabilistic model for detecting deceptive reviews. Expert Syst Appl. 2018;114:210–223. doi: 10.1016/j.eswa.2018.07.005. [DOI] [Google Scholar]
  • 30.Nimala K, Jebakumar R. Sentiment topic emotion model on students feedback for educational benefits and practices. Behav Inf Technol. 2021;40(3):311–319. doi: 10.1080/0144929X.2019.1687756. [DOI] [Google Scholar]
  • 31.Pergola G, Gui L, He Y. TDAM: a topic-dependent attention model for sentiment analysis. Inf Process Manag. 2019;56(6):102084. doi: 10.1016/j.ipm.2019.102084. [DOI] [Google Scholar]
  • 32.Mafarja M, Qasem A, Heidari AA, Aljarah I, Faris H, Mirjalili S. Efficient hybrid nature-inspired binary optimizers for feature selection. Cogn Comput. 2020;12(1):150–175. doi: 10.1007/s12559-019-09668-6. [DOI] [Google Scholar]
  • 33.Faris H, Aljarah I, Al-Betar MA, Mirjalili S. Grey wolf optimizer: a review of recent variants and applications. Neural Comput Appl. 2018;30(2):413–435. doi: 10.1007/s00521-017-3272-5. [DOI] [Google Scholar]
  • 34.Rana N, Latiff MSA, Abdulhamid SM, Chiroma H. Whale optimization algorithm: a systematic review of contemporary applications, modifications and developments. Neural Comput Appl. 2020;32(20):16245–16277. doi: 10.1007/S00521-020-04849-Z. [DOI] [Google Scholar]
  • 35.Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: WSDM 2015—Proceedings of the 8th ACM International Conference on Web Search and Data Mining, pp 399–408. 10.1145/2684822.2685324
  • 36.Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st Int. Conf. Learn. Represent. ICLR 2013—Work. Track Proc. (Online)
  • 37.Salur MU, Aydin I. A novel hybrid deep learning model for sentiment classification. IEEE Access. 2020;8:58080–58093. doi: 10.1109/ACCESS.2020.2982538. [DOI] [Google Scholar]
  • 38.Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: 31st International Conference on Machine Learning, ICML 2014, vol 4. PMLR, pp 2931–2939 (Online)
  • 39.Priyadarshini I, Cotton C. A novel LSTM–CNN–grid search-based deep neural network for sentiment analysis. J Supercomput. 2021;77(12):13911–13932. doi: 10.1007/s11227-021-03838-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90. doi: 10.1145/3065386. [DOI] [Google Scholar]
  • 41.Wang X, Jiang W, Luo Z (2016) Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp 2428–2437 (Online)
  • 42.Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D Nonlinear Phenom. 2020;404:132306. doi: 10.1016/j.physd.2019.132306. [DOI] [Google Scholar]
  • 43.Zulqarnain M, Ishak SA, Ghazali R, Nawi NM, Aamir M, Hassim YMM. An improved deep learning approach based on variant two-state gated recurrent unit and word embeddings for sentiment classification. Int J Adv Comput Sci Appl. 2020 doi: 10.14569/ijacsa.2020.0110174. [DOI] [Google Scholar]
  • 44.Alizadeh M, Beheshti MTH, Ramezani A, Saadatinezhad H (2020) Network traffic forecasting based on fixed telecommunication data using deep learning. In: 2020 6th Iranian Conference on Signal Processing and Intelligent Systems ICSPIS 2020. 10.1109/ICSPIS51611.2020.9349573
  • 45.Mohakud R, Dash R. Designing a grey wolf optimization based hyper-parameter optimized convolutional neural network classifier for skin cancer detection. J King Saud Univ Comput Inf Sci. 2021 doi: 10.1016/J.JKSUCI.2021.05.012. [DOI] [Google Scholar]
  • 46.Goel T, Murugan R, Mirjalili S, Chakrabartty DK. OptCoNet: an optimized convolutional neural network for an automatic diagnosis of COVID-19. Appl Intell. 2021;51(3):1351–1366. doi: 10.1007/S10489-020-01904-Z/TABLES/8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Alizadeh M, Mousavi SE, Beheshti MTH, Ostadi A (2021) Combination of feature selection and hybrid classifier as to network intrusion detection system adopting FA, GWO, and BAT optimizers. In: 2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS), pp 1–7. 10.1109/ICSPIS54653.2021.9729365
  • 48.Brodzicki A, Piekarski M, Jaworek-Korjakowska J. The whale optimization algorithm approach for deep neural networks. Sensors. 2021;21(23):8003. doi: 10.3390/S21238003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Seilsepour A, Alizadeh M, Ravanmehr R, Beheshti MTH, Nassiri R (2022) Self-supervised sentiment classification based on semantic similarity measures and contextual embedding using metaheuristic optimizer. In: 2022 8th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp 1–7. 10.1109/ICSPIS56952.2022.10043914
  • 50.Wang J, Dong Y. Measurement of text similarity: a survey. Information. 2020;11(9):421. doi: 10.3390/info11090421. [DOI] [Google Scholar]
  • 51.Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. arXiv:cs/0409058. (Online)
  • 52.E R, Pham PT, Huang D, Ng AY, Potts C (2007) Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol 1, pp 142–150 (Online)
  • 53.Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. In: Processing, pp 1–6 (Online)
  • 54.Chollet F et al. (2015) “Keras.” GitHub (Online). Available: https://github.com/fchollet/keras
  • 55.Vrbančič G, Brezočnik L, Mlakar U, Fister D, Fister I., Jr NiaPy: Python microframework for building nature-inspired algorithms. J Open Source Softw. 2018 doi: 10.21105/joss.00613. [DOI] [Google Scholar]
  • 56.Sklearn Nature Inspired Algorithms (2020). https://sklearn-nature-inspired-algorithms.readthedocs.io/en/latest/
  • 57.Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D. Soft similarity and soft cosine measure: similarity of features in vector space model. Comput Sist. 2014;18(3):491–504. doi: 10.13053/CyS-18-3-2043. [DOI] [Google Scholar]
  • 58.Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using siamese BERT-networks. In: EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp 3982–3992. 10.18653/v1/d19-1410
  • 59.Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) ALBERT: a lite BERT for self-supervised learning of language representations. ArXiv, vol abs/1909.1 (Online)
  • 60.Liu Y et al. (2019) RoBERTa: a robustly optimized BERT pretraining approach (Online)
  • 61.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems, vol 30 (Online)

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data generated or analyzed during this study are included in this published article.


Articles from The Journal of Supercomputing are provided here courtesy of Nature Publishing Group

RESOURCES