Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2021 Nov 26;12(1):10. doi: 10.1007/s13278-021-00843-y

CAViaR-WS-based HAN: conditional autoregressive value at risk-water sailfish-based hierarchical attention network for emotion classification in COVID-19 text review data

B Venkateswarlu 1,, V Viswanath Shenoi 1, Praveen Tumuluru 1
PMCID: PMC8620331  PMID: 34849175

Abstract

The Corona Virus Disease-2019 (COVID-19) pandemic has made a remarkable impact on economies and societies worldwide. With numerous procedures of social distancing and lockdowns, it becomes essential to know people's emotional responses on a very large scale. Thus, an effective emotion classification approach is developed using the proposed Conditional Autoregressive Value at Risk-Water Sailfish-based Hierarchical Attention Network (CAViaR-WS-based HAN) for classifying the emotions in the COVID-19 text review data. The proposed approach, named CAViaR-WS, is designed by the incorporation of Conditional Autoregressive Value at Risk-Sail Fish (CAViaR-SF) and Water Cycle Algorithm (WCA). Here, the significant features, such as mean, variance, entropy, Term Frequency-Inverse Document Frequency (TF-IDF), SentiWordNet features, and spam word-based features, are extracted to further processing. Based on the extracted features, feature fusion is accomplished using the RideNN. In addition, CAViaR-SF-based GAN is used to perform the spam classification, and then, the emotion classification is carried out using Hierarchal Attention Networks (HAN), where the training procedure of HAN is performed using proposed CAViaR-WS. Furthermore, the developed CAViaR-WS-based HAN offers effective performance results concerning precision, recall, and f-measure with the maximal values of 0.937, 0.958, and 0.948, respectively.

Keywords: Emotion analysis, Water cycle algorithm, SailFish optimizer, Hierarchical attention networks, And COVID-19

Introduction

Recently, social media has become an important medium for people to express their views and has greatly influenced the lives of humans (Agarwal et al. 2011). The micro-blogging sites, namely Twitter, offer a major chance to express their ideas and interact with communities, people, organizations, and groups (Chintala 2012). Micro-blogging sites consist of billion numbers of active users. Twitter has over 316 million monthly active users and traditionally creates over 500 million messages. An open feature of micro-blogging service and total data size being produced makes Twitter a vital research aim of social media and Natural Language Processing (NLP). In addition, the Twitter community utilizes the platform in numerous ways for various needs, from chatter to spreading the news regularly. Individuals convey their opinions on an extensive choice of sports matches, topics, political candidates, movies or their experiences in utilizing certain services and products or sharing their current frame of mind (Stojanovski et al. 2018). The various five features of social media include connectedness, collectivity, collaboration and clarity, and completeness, which support the crisis supervision function. The Twitter data are utilized due to its ease of understanding nature. It is simpler to make use of information sharing as well as collection. Moreover, Twitter offers unprecedented information to lawmakers and the wide-ranging public (Mathur et al. 2020). Emotions play a significant part in rational decision-making, memory, social interaction, human intelligence, perception, learning, etc.

The sentiment analysis and opinion mining are a subtopic of the NLP, whereas text mining deals with the automatic discovery and knowledge extraction regarding the people’s evaluation, sentiments, and opinions from the textual data, namely review websites, personal blogs, and customer feedback forms. The sentiment analysis and opinion mining have gained significant attraction toward the practical applications and usage in today’s environment (Balabantaray 2012). Emotions are the way to express people’s thoughts and feelings. Online social media, like Facebook and Twitter, have transformed communication language. Nowadays, individuals can communicate emotions, facts, opinions, and emotion intensities on various topics in small texts. Evaluating the emotions expressed in social media content has gained significant attraction in the research areas based on NLP. It has an extensive application in social welfare, commerce, public health, etc. For example, it can be employed in public health (Chen et al. 2012), identification of the public opinion regarding the political tendencies (Cambria 2016), stock market observation (Yen 2014), and brand management. Emotion analysis is the method of estimating the attitude toward the topic or target. The attitude can be emotional states, like joy, sadness, or anger (Mohammad 2016; Jabreel and Moreno 2019) or polarity (negative or positive). Existing research topics for analyzing the texts consist of several text contents, such as weblogs (Jung et al. 2006), news, text messages, stories (Alm et al. 2005), and spoken dialogs (Lee and Narayanan 2005). For numerous applications, detecting the emotions on the document level may not be satisfactory. A text-driven emotion estimation approach would promote detecting the emotional affinity of sentences.

Analyzing emotions with respect to the sentence level is very much significant for the various emotion analysis systems. The emotion analysis in the texts categorizes the texts into three different categories, such as negative, positive, and neutral. On the other hand, this type of analysis can be extended and categorized into various six primary emotion classes, such as negative, positive, disgust, hate, joy, fear, and surprise. Several Machine learning techniques (Aher and Jena 2021; Aher 2018) are used for classifying the emotions. With the preventive measures and the significant improvement of the COVID-19 situation, it is necessary to know how non-governmental organization (NGO), governments, and social organizations support the individuals affected by the COVID-19 situation. Since online communication is performed and recorded in the text data format, classifying the emotions during COVID-19 becomes a fundamental part of knowing and addressing the COVID-19 impacts on the people (Kleinberg et al. 2020). The sentiments and contents expressed on Twitter based on the analysis in the initial phases of the COVID-19 facilitate the effective and acceptance of the effect of happenings related to beliefs, sentiments, and thoughts of the widespread public. This type of understanding would facilitate the large-scale chances for the education and broadcasting of the correct information regarding the public health recommendations (Medford et al. 2020).

This research focuses on devising a robust technique for the emotion classification in the COVID-19 text review data using the proposed CAViaR-WS-based HAN. Initially, the text review data are gathered from the dataset and presented to the feature extraction module to extract necessary features for further processing. After that, the extracted features are presented to feature fusion module, such that the features are fused by RideNN. The fused features are then given to the spam classification module, which is carried out using CAViaR-SF-based GAN to classify the spam and non-spam reviews. Finally, the process of emotion classification is carried out effectively using the proposed CAViaR-WS-based HAN.

The key part of this research is explained below as follows,

  • Proposed CAViaR-WS-based HAN: An effective technique is devised for the emotion classification in the COVID-19 text review data using the proposed CAViaR-WS-based HAN. The process of feature fusion is done using the RideNN, and the spam classification is performed using the CAViaR-SF-based GAN. In addition, the emotion classification is carried out using developed CAViaR-WS-based HAN, which is designed by the integration of CAViaR-SF and WCA, respectively.

The research paper is organized as follows: Sect. 2 describes several techniques employed for emotion classification, and Sect. 3 illustrates the developed technique for the emotion classification in COVID-19 text review data. Section 4 portrays the results and discussion, and finally, Sect. 5 concludes the paper.

Motivation

In this section, various emotion classification approaches are reviewed along with their advantages and disadvantages that motivate the researchers to design the proposed approach for performing the emotion classification in the COVID-19 text review data.

Literature review

The various existing emotion classification approaches are reviewed in this section. Kleinberg et al. (2020) developed a Ground truth dataset for measuring and expressing emotions from the text data. This method achieved a higher possibility of inferring worries and concerns from the text data. However, this technique failed to make efforts for capturing the emotional responses to the sample over a longer period. Ahmad et al. (2019) introduced a Long Short-Term Memory-Convolutional Neural Network (LSTM-CNN) model to analyze emotions. This method achieved better categorization of tweets as non-extremist and extremist. However, this approach failed to offer an automated approach for crawling, storing, and cleaning twitter contents. Jabreel and Moreno (2019) devised a deep learning model for classifying the multi-label emotions in tweets. Here, the system was interpretable by visualizing the attention weights, but this approach failed to model the relationships among the labels and phrases. Kamal (2019) designed a Lexicon-based technique for categorizing emotions. This method achieved effective statistical results based on human emotions, but this technique suffers from higher processing time.

Stojanovski et al. (2018) developed a deep neural network (DNN) for the detection of emotions from twitter texts. This method achieved better performance in sentence categorization tasks. However, this method suffers from higher computational complexity issues. Asghar et al. (2018) introduced a Twitter sentiment analysis framework (T-SAF framework) for sentimental classification. This approach highly focused on domain-specific words, and this technique achieved better categorization of slangs expressed in tweets. However, this method failed to consider the automatic scoring of domain-specific words. Yu et al. (2018) devised a Dual Attention Transfer Network (DATN) for the sentimental classification. This technique attained enhanced multi-label emotional categorization results and accurately estimated general sentiment words and emotion-specific emojis. However, this approach failed to consider larger datasets in order to yield effective results. Illendula and Sheth (2019) introduced a multimodal emotion classification technique for classifying multimodal emotions. This method achieved better accuracy for the emotional classification task. However, this technique failed to use human-annotated test sets to enhance the evaluation results.

Challenges

The various challenges faced by the emotion classification techniques are explained below as follows,

  • LSTM-CNN model introduced in Ahmad et al. (2019) obtained better Twitter classification results. However, it lacks the ability to incorporate social and context features to enhance the system performance and achieve effective results.

  • The major limitation confronted by the ground truth dataset developed in Kleinberg et al. (2020), which failed to capture relevant constructs as the rate of out-of-vocabulary terms in data was low.

  • The emotional modeling developed in Arulmurugan et al. (2019) efficiently improved the sentence-level sentiment categorization performance with improved sentimental classification accuracy. However, it failed to increase the classification accuracy for the large corpora of document-level or sentence-level sentiment categorization.

  • The recurrent neural network (RNN) model in Tai et al. (2018) only focused on the generation review of review texts, but still, it failed to improve the text authenticity generated by machines.

Proposed CAViaR-WS-based HAN for emotion classification from COVID-19 text review data

This section illustrates the emotion classification technique, named CAViaR-WS-based HAN for classifying the emotions from the COVID-19 text review data. The various phases involved in this technique are feature extraction, feature fusion, spam classification, and emotion classification. Initially, the text review data are considered as an input and the input text review data are presented to the feature extraction module in order to extract the important features, such as mean variance, entropy, TF-IDF, Sentiwordnet-based features, and spam words-based features. After that, these extracted features are presented to the feature fusion phase where the process is carried out using RideNN (Binu and Kariyappa 2018) to fuse the extracted features. Once the feature fusion is done, the fused features are given to the spam classification phase for classifying spam or non-spam using the CAViaR-SF-based GAN, which is designed by the integration of GAN (Usman et al. 2020) CAViaR (Engle and Manganelli 2004) and SFO (Shadravan et al. 2019). Finally, the emotion classification is performed by incorporating input text review data and the output obtained from the spam classification phase for classifying the emotions into worry, anger, disgust, fear, anxiety, sadness, happiness, relaxation, and desire where the process is carried out using the HAN (Yang et al. 2016). The training process of the HAN is done using the proposed optimization algorithm, named CAViaR-WS. However, the proposed CAViaR-WS algorithm is derived by the combination of CAViaR-SF and WCA (Eskandar 2012). Figure 1 depicts the schematic representation of the developed CAViaR-WS-HAN.

Fig. 1.

Fig. 1

Schematic view of the proposed CAViaR-WS-based HAN for emotion classification from COVID-19 text review data

Input data acquisition

Let us consider the input data G with various attributes and is expressed as

G={Ga,b};1ad1bc 1

where, Ga,b signifies the COVID-19 text review data in the database G with ath attribute in bth data. Here, c denotes the overall data points, and d indicates the total attributes. Each data is subjected to the phase of feature extraction for extracting the significant features for the further processing.

Feature extraction

The input COVID-19 text review data Ga,b are presented to the feature extraction phase for extracting various features, such as mean, variance, entropy, TF-IDF, SentiwordNet, and spam words-based features, and these features are explained below as follows,

Mean

Mean is calculated with respect to the average value of words present in the individual data, and the equation is formulated as,

F1=1Suo×o=1SuoSuo 2

where, Suo denotes the average value of the words, the total words in the review data are indicated as Suo, and the term F1 signifies the mean feature.

Variance

The variance feature F2 is computed with respect to the measured mean expression value, and the equation is formulated as

F2=o=1Suouo-F1Suo 3

Here, F1 indicates the mean value.

Entropy

The entropy of the text review data is computed attribute-wise and is expressed as,

EρC,ρD=C=1vρCAρC=C,ρD=D.logρC=C,ρD=D 4

where, EρC,ρD signifies the entropy of the data attributes, vρC represents the exclusive attribute values, and C,D indicates the attribute position of the data. The entropy feature is denoted as F3.

TF-IDF

TF-IDF is a numerical statistic technique that exhibits the significance of words in text data. This approach is most commonly used as the weighting parameter in the data restoring process and text mining. TF-IDF is generally employed for avoiding filtering words for text classification purposes. In general, the TF determines the occurrence of every attribute in the text review data, whereas IDF computes the rare occurrence of attributes in the text review data files (Christian 2016). The TF-IDF feature F4 is expressed below as,

F4=Tlog1+f1logf2 5

where, T signifies the overall documents, TF is represented as f1 and f2 denotes the IDF value.

SentiWordNet

The SentiWordNet feature construction is a two-stage process. The initial step is the exploration of the WordNet term relations, such as synonym, antonym, and hyponymy in order to extend the seed word’s core. The second step is the computation of the subset of wordNet expressions, namely positive and negative label (Ohana and Tierney 2009). The SentiWordNet features with the positive and the negative score, and it is expressed as,

Pa,Na=SWrx 6

where, the SentiWordNet feature is denoted as SW., Pa signifies the positive score, and Na indicates the negative score. The features computed with respect to the positive and the negative feature scores are expressed as,

f1x=Pa 7
f2x=Na 8

where, f1x and f2x are the features measured from the positive score and the negative score of SentiWordNet features, and F5 signifies the SentiWordNet features.

Spam word-based features

The spam detection is a significant process as it aids in evaluating the appropriate feedback from the customer reviews. The spam reviews also estimated the accurate review scores. The spam word-based features consist of the detected spam reviews by assessing the review contents using the properties related to reviews. The properties related to reviews, namely the total review words where higher number of words are present in the spam than the truthful review. The truthful review is much different than the spam review, and the spam review includes more pronouns, while the truthful review includes more nouns, pronouns, adjectives, determiners, and coordinating conjunctions (Saeed et al. 2018). The obtained features to detect the spam are given as,

f3x=Cgq 9

For the xth review, the spam words and the total word count are denoted as, Cg and q

f4x=fCgf 10

where, the spam word frequency is signified as,fCg, and the term F6 signifies the spam words-based features.

Feature fusion

Finally, the extracted features, such as mean, variance, entropy, TF-IDF, SentiwordNet, and spam words-based features, are incorporated together to generate a feature vector output for minimizing the complexity in evaluating the reviews. Hence, the feature vector obtained from the features is expressed as,

F=F1,F2,F3,F4,F5,F6 11

where, F1 represents the mean, variance is denoted as F2, F3 signifies the entropy, F4 represents the TF-IDF features, SentiWordNet-based features are indicated as F5, and spam-based features are signified as F6.

Based on the correlation technique, the features are sorted, and the feature fusion expression is given as,

Ffused=n=1kαhFi 12
n=1+IH 13
G=Ih 14

where, the value h lies between the range 1<hk,I represents the number of features, and k signifies the selected features.

RideNN for computing α

This section illustrates the process of computing the parameter α using the RideNN (Binu and Kariyappa 2018). RideNN is designed using Rider Optimization Algorithm (ROA), where the training procedure of Neural Network (NN) is done by ROA. RideNN has cost and time benefits. Also, it is suitable for large-scale applications. RideNN contains three layers, namely input layer, hidden layer, and the output layer comprised with several neurons in every layer where the feature vector F is presented as an input to the Ride NN, and the parameter α is computed by the following expression,

α=corrQi,ei 15

where, text review data are denoted as Qi, and ei represents the average of Qi belonging to the class.

Spam classification using CAViaR-SF-based GAN

Here, the classification of spams, as spam or non-spam, is carried out using developed CAViaR-SF-GAN approach. The fused features Ffused are fed as an input to the spam classification phase, which is performed using GAN classifier (Usman et al. 2020). GAN is used for achieving efficient spam classification results in complex cases. The GAN consists of two various units, such as generator and discriminator. Here, the generator endeavors to perplex discriminator through the generation of feasible data, whereas the discriminator determines the fake data from a real group of data. Moreover, the generator and discriminator are trained simultaneously for achieving global convergence. The training of GAN is done using CAViaR-SF, which is the combination of the CAViaR (Engle and Manganelli 2004) and SFO (Shadravan et al. 2019). The SFO (Shadravan et al. 2019) is a nature-influenced optimization technique in order to solve optimization problems without any structural variations. The SFO improves the diversification and the intensification, which facilitates the efficient tuning of the internal model parameters of the GAN. On the other hand, the CAViaR (Engle and Manganelli 2004) describes the distribution of returns with respect to the quantile characteristics. The unknown parameters can be determined by the progression of the quantiles with respect to time using the autoregressive process, and the regression quantile framework. The algorithmic steps of the CAViaR-SF-based GAN are illustrated below as follows,

  • i.

    Initialization

    Let us initialize the population with ω number of solutions and is expressed as,
    X={X1,X2,,Xe,Xω};1eω 16
    where, ω represents the total number of solutions, and Xe denotes eth solution.
  • ii.

    Determination of error

    The optimal solution is achieved using fitness function, which is called as minimization problem and therefore, the solution with low Mean Square Error (MSE) is considered as optimal solution. Thus, the equation is expressed as,
    MSerr=1ρd=1ρCd-Cd2 17
    where, Cd is the classifier output of GAN, and Cd represents the target output, ρ denotes the count of data samples, where 1<dρ.
  • iii.

    Elitism

    Elitism copies the unmodified fittest solutions into the next generation. Here, the sailfish with the finest location is saved in all iterations and is regarded as elite. The elite sailfish is the fittest sailfish that has been achieved so far, and it can influence the acceleration and maneuverability of the sardines during the attack.

  • iv

    Attack alternation strategy

    According to the attack alternation mechanism of SFO, sailfish hunt and herd their prey. The herding characteristics of sailfish regulate their location based on the position of other hunters in the region of the prey school with no direct relationship among them. The SFO algorithm describes that the sailfish’s attack-alternation strategy is used during group hunting. The exploration phase offered by the search agents comprises of searching a large portion of search space for achieving effective solutions that needs to be refined. Sailfish can attack in every possible direction and inside a shrinking circle. Accordingly, the location of the sailfish can be updated within a sphere around the finest solution. The update equation achieved by the attack alternation strategy of SFO (Shadravan et al. 2019) is expressed as,
    Ynew_SFi=Yelite_SFi-λirand0,1Yelite_SFi+YinjuredKi2-YoldSFi 18
    Representing Ynew_SFi as Yw+1i and Yold_SFi as Ywi
    Yw+1i=Yelite_SFi-λirand0,1Yelite_SFi+YinjuredKi2-Ywi 19
    According to CAViaR (Engle and Manganelli 2004), the standard equation is given as,
    Ywi=l0+p=1zlpYw-pi+j=1tljVYw-ji 20
    Assuming z=t=2
    Ywi=l0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i 21
    The SFO algorithm is incorporated with CAViaR by substituting Eq. (21) is substituted in (19), and thus, the standard equation of CAVIAR-SF is expressed as,
    Yw+1i=Yelite_SFi-λirand0,1Yelite_SFi+YinjuredKi2-l0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i 22
    Yw+1i=Yelite_SFi-λirand0,1Yelite_SFi+YinjuredKi2+λil0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i 23
    λi=2rand0,1PD-PD 24
    PD=1-NSFNSF+NK 25
    where,l denotes the unknown parameter for the vector P,V(.) represents the fitness of solution, Yelite_SFi signifies the location of elite sailfish formed until, best position of the injured sardine formed so far is signified as YinjuredKi, λi represents the coefficient at the ith iteration, PD signifies the prey density, NSF signifies the number of sailfish, and NK specifies the number of sardines.
  • v.

    Hunting and catching prey

    According to the hunting and the catching prey strategy mechanism in the SFO, sailfish have maximum energy to catch prey at the beginning of the search, and also the sardines are not much injured and tired. The attack power of sailfish gradually minimizes over time during hunting. With respect to the regular and strong attacks, the energy stores in prey are minimized and have a decrease in capability to identify the direct information regarding sailfish location. To mimic the process, the sardines are obliged to update the position with respect to the current best location of sailfish and attack power at every iteration. The update equation obtained by the Hunting and Catching prey strategy is expressed as,
    Ynew_Ki=ZYelite_SFi-l0+l1Yold_K-1i+l2Yold_K-2i+l1V(Yold_K-1i)+l2V(Yold_K-2i)+AP 26
    where, Yelite_Ki signifies the optimal solution of elite sailfish, Yold_Ki denotes the present location of sardine, Z signifies the random number that lies within the range 0,1, and AP indicates the attack power of sailfish. Finally, the CAViaR-SF-GAN classified the features into spam and non-spam in an effective way, and the spam classification output is denoted as Rω.

Emotion classification using HAN

In this phase, the input COVID-19 text review data Ga,b and the output Rω obtained from the spam classification phase are fed as an input to the HAN (Yang et al. 2016) in order to classify the emotions from the COVID-19 text review data effectively where the training process of HAN classifier is carried out using the developed CAViaR-WS.

Architecture of HAN

The structure of HAN is presented in Fig. 2. The structure of HAN (Yang et al. 2016) is comprised with several units, such as attention module, long short-term memory (LSTM), softmax, and self-attention module. The model utilizes the spam classified output Rω wherein the HAN classifier is used for classifying the emotions from the COVID-19 text review data.

Fig. 2.

Fig. 2

Architecture of HAN

  • Attention model

    Attention model comprised of three modules, like CNN, LSTM, and attention layer and is explained below.

    CNN Here, the feature layers of VGGNet are used for extracting the feature maps. The initial procedure is to rescale image in 448×448 pixels. Therefore, generated output from VGGNetfeature layers is in the size of 512×14×14. Here, the 512×196 dimensions vector is placed in a fully connected layer based on the tanh function that converts it to 1024×196 dimension vector.
    Mij=tanh(LjMj+tj) 27
    where, Mj represents the feature vector of every region, Mij expresses every region. The extension of dimensions makes the integration process in depth.
    LSTM LSTM includes various memory cells and hence comprised of four phases in updating the cell states. The initial phase makes the decision in order to evaluate the thrown information from the cell state. The other phases make the decision regarding the new information to be saved in the cell state, and the equation is expressed as,
    ym=σLyzm-1,sm+ty 28
    jm=σLjzm-1,sm+tj 29
    Wj=tanhLW.zm-1,sm+tW 30
    where, Wj represents the memory to be known, sm signifies the input vector, zm denotes the hidden state, ym specifies the forget gate, jm indicates the input gate.Here, the old state Wm-1 gets updated to new state Wm, which is specified as,
    Wm=ymWm-1+jmWj 31
    om=σLo.(zm-1,sm)+to 32
    zm=omtanh(Wm) 33
    where, om represents the output gate.
    Attention layer Here, Leq and Lei are presented to the fully connected layer and incorporated with tanh function. By considering the Softmax function, the attention distribution map is achieved.
    zatt=tanh((Lei,att.Mij+tij,att)(Leq+teq,att)) 34
    Cξ=Softmax(Lp.zatt+tp) 35
    where, Lei signifies the 196×1024 dimension matrix, Leq represents the 1024 dimension matrix, Lei,att and Leq,att indicate the 1024×512 dimension matrix,Lp specifies the 512 dimension vector, and Cξ signifies shape, and specifies addition vector. The weighted sum is determined by the attention distribution map and is expressed as,
    Mj=pj,Mij 36
  • Self-Attention model

    Here, the self-attention model is utilized for gathering global information, and the equation is given as,
    εj=sj+q=1Nph(sj,nj)βLu,sq 37
    where, h(.) signifies the function among the j and q, Lu represents the linear transform, and the normalization factor is denoted as β.

    Thus, the output obtained from the HAN is denoted as O that signifies the classified emotions, namely worry, anger, disgust, fear, anxiety, sadness, happiness, relaxation, and desire.

Training process of CAViaR-WS

The training process of HAN (Yang et al. 2016) is performed using the proposed optimization algorithm, named CAViaR-WS algorithm, which is formed by hybridization of CAViaR-SF and WCA (HadiEskandar 2012). In addition, the classifier weights are trained using the developed CAViaR-WS in order to attain an optimal solution. CAViaR-SF is devised by the combination of CAViaR and SFO (Shadravan et al. 2019). WCA (HadiEskandar 2012) is a nature-inspired optimization algorithm for solving optimization limitation issues. WCA is developed by considering the progression of the water cycle and monitoring how rivers and streams pass into the sea. The WCA model starts with an initial population named raindrops where the finest raindrop is chosen as a sea. After that, the best raindrop is chosen as a river, and the remaining raindrops are chosen as the streams passing to the sea and rivers. WCA method depends on the magnitude of flow. By incorporating the CAViaR-WS with the WCA shows the efficiency of the developed scheme and minimizes the computational complexity problems. The algorithmic steps involved in the CAViaR-WS are described below as follows,

  • (i)

    Initialization: The foremost step is the initialization of the solution and is formulated in Eq. (16)

  • (ii)
    Compute fitness measure: The fitness measure is employed for obtaining the optimal solution by measuring the best fitness value, and the equation for computing the fitness measure is given as,
    Fitness=1Kθ=1KO-Uτ2 38
    where, O denotes the classifier output, and Uτ signifies the target output.

    In addition, the elitism, and the attack alternation strategy of the SFO algorithm is explained in the above Sect. 3.4.

  • (iii)
    Determination of update solution: To improve the algorithmic performance, the WCA is incorporated with CAViaR-SF equation formulated in (23) to develop the proposed CAViaR-WS for classifying the emotions. As per WCA (HadiEskandar 2012), the update equation is given by,
    Ystreamnew=Ysea+μrandn1,Bvar 39
    Represent Ystreamnew=Yw+1i
    Ysea=Yw+1i-μrandn1,Bvar 40
    From Eq. (23),
    Yw+1i=Yelite_SFi-λirand0,1Yelite_SFi2-λirand0,1YinjuredKi2+λil0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i 41
    Yw+1i=Yelite_SFi1-λirand0,12-λirand0,1YinjuredKi2+λil0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i 42
    Substituting Eq. (40), as Ysea and Yelite_SFi are the optimal solutions.
    Yw+1i=Yw+1i-μrandn1,Bvar1-λirand0,12-λirand0,1YinjuredKi2+λil0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i 43
    Yw+1i=Yw+1i1-λirand0,12-μrandn1,Bvar1-λirand0,12-λirand0,1YinjuredKi2+λil0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i 44
    Yw+1i-Yw+1i1-λirand0,12=-μrandn1,Bvar1-λirand0,12-λirand0,1YinjuredKi2+λil0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i 45
    Yw+1i1-1+λirand0,12=-μrandn1,Bvar1-λirand0,12-λirand0,1YinjuredKi2+λil0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i 46
    Yw+1i=2λirand0,1λil0+l1Yw-1i+l2Yw-2i+l1VYw-1i+l2VYw-2i-μrandn1,Bvar1-λirand0,12-λirand0,1YinjuredKi2 47
    where, μ signifies the coefficient that expresses the limit of the searching area near sea and the value is set to 0.1, randn denotes the distributed random number.

    Furthermore, the update equation achieved by the hunting and catching prey strategy of SFO is expressed in Eq. (26) in Sect. 3.4.

  • (iv)

    Evaluating feasibility: The fitness value is computed for each individual in such a way that the fitness function with optimal value is considered as the best solution.

  • (v)

    Termination: The optimum weights are computed on the basis of continual manner till utmost iterations are attained. The pseudo-code of developed CAViaR-SF is illustrated in Algorithm 1.(v)

    Finally, the developed CAViaR-WS-based HAN classified the emotions from the COVID-19 text review data in an effective way.

Results and discussion

This section describes the results and discussion of the developed CAViaR-WS-based HAN with respect to the evaluation metrics, such as precision, recall, and f-measure.

Experimental setup

The implementation of the proposed CAViaR-WS-based HAN is performed in PYTHON tool using COVID-19 Real World Worry Dataset (RWWD) (COVID-19 Real World Worry Dataset 2021) with PC having 2 GB RAM, windows 10 OS, and Intel i3 core processor.

Dataset description

COVID-19 RWWD (COVID-19 Real World Worry Dataset 2021) is a ground truth emotion dataset to signify and express the emotions in text. This dataset consists of 5,000 texts with 2,500 short texts and 2,500 long texts. Moreover, the long RWWD includes texts, which are open-ended in length, and asked the participants to express their feelings according to their wish, whereas the short RWWD asked the same participants to express their feelings according to the Tweet-sized texts. Thus, this technique learns the worries and the emotional responses of the participants in an effective way.

Evaluation metrics

The performance of the developed CAViaR-WS-based HAN method is analyzed based on three metrics, namely precision, recall, and F-measure, respectively.

Precision Precision is a measure that specifies the fraction of accurately classified emotions from the text review data and is expressed as,

P=TlTrTr 48

Here, Tl signifies relevant emotions, and Tr represents the classified emotions.

Recall It is a measure of relevant emotions present in classified emotions that is related to text review data.

R=TlTrTl 49

F-measure It computes the mean difference between the precision and the recall measure and is expressed as,

FM=2PRP+R 50

Performance analysis

The performance assessment is carried out using evaluation metrics with respect to spam and emotion classification by varying the iterations.

  1. Analysis based on spam classification

    Figure 3 presents the performance assessment of the developed CAViaR-WS-based HAN with respect to spam classification by varying the iterations using performance metrics, namely precision, recall, and f-measure. The analysis based on precision is illustrated in Fig. 3a). The proposed CAViaR-WS-based HAN measured a precision value for iteration 5 is 0.806, iteration 10 is 0.824, iteration 15 is 0.830, and iteration 20 is 0.897 for the training data 60%. Figure 3b) shows the assessment using recall measure. The recall value measured by the proposed CAViaR-WS-based HAN with respect to iteration 5 is 0.936, iteration 10 is 0.947, iteration 15 is 0.951, and iteration 20 is 0.963 for the training data 80%. The f-measure assessment is presented in Fig. 3c). By considering the training data as 70%, the developed CAViaR-WS-based HAN obtained an f-measure value for iteration 5 is 0.882, iteration 10 is 0.904, iteration 15 is 0.904, and iteration 20 is 0.925.

  2. Analysis based on emotion classification

    The performance assessment of the proposed CAViaR-WS-based HAN with respect to emotion classification by varying the iterations based on the performance metrics is presented in Fig. 4. The precision analysis is portrayed in Fig. 4a). The proposed method achieved a precision value for the training data 70% with respect to the iteration 5 is 0.781, iteration 10 is 0.832, iteration 15 is 0.851, and iteration 20 is 0.917. Figure 4b) shows the assessment of recall measure. For the training data 60%, the proposed CAViaR-WS-based HAN obtained a recall value for the iteration 5 is 0.849, iteration 10 is 0.889, iteration 15 is 0.918, and iteration 20 is 0.953. The f-measure analysis is shown in Fig. 4c). The f-measure value measured by the proposed CAViaR-WS-based HAN with respect to the iteration 5 is 0.865, iteration 10 is 0.882, iteration 15 is 0.894, and iteration 20 is 0.936 for the training data 80%.

Fig. 3.

Fig. 3

Analysis using spam classification with respect to iterations a Precision b Recall c F-measure

Fig. 4.

Fig. 4

Analysis using emotion classification a Precision b Recall c F-measure

Comparative methods

The performance of the developed CAViaR-WS-based HAN method is analyzed using the existing techniques, like Deep Learning (Jabreel and Moreno 2019), Lexicon-based approach ("Emotion classification and crowd source sensing: a lexicon based approach" 2019), Deep Convolutional Neural Network (DNN) (Stojanovski 2018).

Comparative analysis

This section explains the comparative assessment of the developed CAViaR-WS-based HAN method by varying the training data in terms of evaluation metrics.

  1. Analysis based on spam classification

    The assessment of the developed CAViaR-WS-based HAN for the spam classification with respect to the performance metrics by changing the training data percentage is shown in Fig. 5. Figure 5a) portrays the precision analysis. The precision value measured by the proposed CAViaR-WS-based HAN is 0.900, whereas the precision value obtained by the existing techniques, such as Deep learning is 0.842, Lexicon-based approach is 0.874, DNN is 0.875 for the training data 80%. The performance improvement achieved by the developed CAViaR-WS-based HAN in comparison with the existing techniques such as Deep learning, Lexicon-based approach, and DNN is 6.386%, 2.796%, and 2.757. The assessment using recall measure is illustrated in Fig. 5b). By considering the training data as 70%, the recall value measured by the Deep learning is 0.886, Lexicon-based approach is 0.916, DNN is 0.920, and the proposed CAViaR-WS-based HAN is 0.939. The performance gain computed by the developed CAViaR when compared with the existing techniques is 5.669%, 2.491%, and 2.041%. Figure 5c) presents the f-measure analysis. By considering the training data as 60%, the f-measure value obtained by the Deep learning is 0.828, Lexicon-based approach is 0.843, DNN is 0.845, and the proposed CAViaR-WS-based HAN is 0.866. The performance improvement measured by the developed CAViaR-WS-based HAN in comparison with the existing techniques such as Deep learning is 4.434%, Lexicon-based approach is 2.730%, and DNN is 2.461%.

  2. Analysis based on emotion classification

    Figure 6 presents the assessment of the developed CAViaR-WS-based HAN with respect to emotion classification by changing the training data percentage using performance metrics, namely precision, recall, and f-measure. The analysis based on precision is illustrated in Fig. 6a). For the training data 80%, the precision value computed by the Deep learning is 0.751, Lexicon-based approach is 0.817, DNN is 0.822, and the proposed CAViaR-WS-based HAN is 0.841. The performance improvement achieved by the developed CAViaR-WS-based HAN in comparison with the existing techniques, such as Deep learning, Lexicon-based approach, and DNN, is 10.667%, 2.839%, and 2.219%. Figure 6b) shows the assessment using recall measure. The recall value measured by the proposed CAViaR-WS-based HAN is 0.905, whereas the recall value obtained by the existing techniques, such as Deep learning, is 0.721, Lexicon-based approach is 0.845, DNN is 0.849 for the training data 70%. The performance gain computed by the developed CAViaR when compared with the existing techniques is 20.334%, 6.586%, and 6.203%. The f-measure assessment is presented in Fig. 6c). By considering the training data as 80%, the f-measure value obtained by the Deep learning is 0.780, Lexicon-based approach is 0.856, DNN is 0.861, and the proposed CAViaR-WS-based HAN is 0.882. The performance improvement measured by the developed CAViaR-WS-based HAN in comparison with the existing techniques, such as Deep learning, Lexicon-based approach, and DNN is 11.604%, 2.951%, 2.465%.

Fig. 5.

Fig. 5

Analysis with respect to spam classification a Precision b Recall c F-measure

Fig. 6.

Fig. 6

Analysis with respect to emotion classification a Precision b Recall c F-measure

Comparative discussion

This section illustrates the comparative assessment of developed CAViaR-WS-based HAN with respect to the spam classification and emotion classification based on the performance metrics by varying the training data percentage. Table 1 presents the comparative discussion for the training data 90%. In spam classification, the precision value measured by the techniques, such as Deep learning approach is 0.902, Lexicon-based approach is 0.908, DNN is 0.909, and the developed CAViaR-WS-based HAN is 0.932 for the training data 90%. Likewise, recall value measured by existing techniques, such as Deep learning, Lexicon-based approach, and DNN is 0.933, 0.933, and 0.943, whereas the developed CAViaR-WS-based HAN achieved a recall value of 0.965 for the training data 90%. By considering the training data as 90%, the f-measure value achieved by the proposed CAViaR-WS-based HAN is 0.948, while the Deep learning, Lexicon-based approach, and DNN techniques achieved an f-measure of 0.917, 0.920, and 0.926, respectively. In emotion classification, the precision value obtained by the existing techniques, such as Deep learning, Lexicon-based approach, and DNN, is 0.763, 0.912, and 0.912. In contrast, the developed CAViaR-WS-based HAN achieved a precision value of 0.937 for training data 90%. For the training data 90%, the recall value computed by the Deep learning is 0.871, Lexicon-based approach is 0.914, DNN is 0.924, and the proposed CAViaR-WS-based HAN is 0.958. The developed CAViaR-WS-based HAN measured an f-measure value of 0.948, whereas the existing techniques, such as Deep learning, Lexicon-based approach, and DNN, achieved an f-measure value of 0.8130.9130.918 by considering the training data as 90%. From the table, it is clearly shown that the evaluation metrics, namely Precision, Recall, and F-Measure, achieved a maximum precision of 0.937, higher recall of 0.958, and maximal F-measure of 0.948 for the emotion classification.

Table 1.

Comparative Discussion

Metrics Deep learning Lexicon-based approach DNN Proposed CAViaR-based HAN
Spam classification Precision 0.902 0.908 0.909 0.932
Recall 0.933 0.933 0.943 0.965
F-measure 0.917 0.920 0.926 0.948
Emotion classification Precision 0.763 0.912 0.912 0.937
Recall 0.871 0.914 0.924 0.958
F-measure 0.813 0.913 0.918 0.948

Conclusion

This research presents a robust and effective emotion classification mechanism named the CAViaR-WS-based HAN for classifying the emotions in COVID-19 text review data. The proposed CAViaR-WS is devised by the incorporation of CAViaR-SF and WCA. Here, the feature extraction process is performed for extracting the features, namely mean, variance, TF-IDF, SentiWordNetfeatures, and spam words-based features. After that, RideNN is utilized for performing the future fusion process, and then, the spam classification is carried out using CAViaR-SF-based GAN. Finally, the emotion classification process is performed by the HAN classifier such that the training process of the HAN is done using the proposed optimization algorithm named CAViaR-WS. The developed technique is more effective in accurately classifying the emotions and improving the training speed. The performance assessment of the developed CAViaR-WS-based HAN is done using evaluation metrics, such as Precision, Recall, and F-Measure with the maximum precision of 0.937, higher recall of 0.958, and maximal F-measure of 0.948. However, the performance of the proposed method needs to be further enhanced to perform more accurate emotion classification. Hence, in future, hybrid optimization algorithms will be developed for enhancing the emotion classification performance effectively.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

B. Venkateswarlu, Email: bvenki289@gmail.com

V. Viswanath Shenoi, Email: shenoi.vis@gmail.com.

Praveen Tumuluru, Email: praveenluru@gmail.com.

References

  1. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau RJ (2011) Sentiment analysis of twitter data. In Proceedings of the Workshop on Language in Social Media (LSM 2011), pp 30–38
  2. Aher CN. Optimized SVM model for vehicle tracking and positioning in GSM network. Int J Creat Res Thoughts (IJCRT) 2018;6(1):1355–1359. [Google Scholar]
  3. Aher CN, Jena AK. Rider-chicken optimization dependent recurrent neural network for cancer detection and classification using gene expression data. Comput Methods Biomech Biomed Eng: Imaging Visual. 2021;9(2):174–191. [Google Scholar]
  4. Ahmad S, Asghar MZ, Alotaibi FM, Awan I. Detection and classification of social media-based extremist affiliations using sentiment analysis techniques. Human-Centric Comput Inf Sci. 2019;9(1):1–23. [Google Scholar]
  5. Alm CO, Roth D, Sproat R (2005) Emotions from text: machine learning for text-based emotion prediction. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp 579–586
  6. Arulmurugan R, Sabarmathi KR, Anandakumar H. Classification of sentence level sentiment analysis using cloud machine learning techniques. Clust Comput. 2019;22(1):1199–1209. doi: 10.1007/s10586-017-1200-1. [DOI] [Google Scholar]
  7. Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F. T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Syst. 2018;35(1):e12233. doi: 10.1111/exsy.12233. [DOI] [Google Scholar]
  8. Balabantaray RC, Mohammad M, Sharma N. Multi-class twitter emotion classification: A new approach. Int J Appl Inf Syst. 2012;4(1):48–53. [Google Scholar]
  9. Binu D, Kariyappa BS. RideNN: A new rider optimization algorithm-based neural network for fault diagnosis in analog circuits. IEEE Trans Instrum Meas. 2018;68(1):2–26. doi: 10.1109/TIM.2018.2836058. [DOI] [Google Scholar]
  10. Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst. 2016;31(2):102–107. doi: 10.1109/MIS.2016.31. [DOI] [Google Scholar]
  11. Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, pp 71–80
  12. Chintala S (2012) Sentiment Analysis using neural architectures. New York University
  13. Christian H, Agus MP, Suhartono D (2016) Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech: Comput, Math Eng Appl 7(4): 285-294
  14. COVID-19 Real World Worry Dataset, https://github.com/ben-aaron188/covid19worry, accessed on January 2021.
  15. Engle RF, Manganelli S. CAViaR: Conditional autoregressive value at risk by regression quantiles. J Business Econ Stat. 2004;22(4):367–381. doi: 10.1198/073500104000000370. [DOI] [Google Scholar]
  16. Eskandar H, Sadollah A, Bahreininejad A, Hamdi M. Water cycle algorithm–A novel metaheuristic optimization method for solving constrained engineering optimization problems. Comput Struct. 2012;110:151–166. doi: 10.1016/j.compstruc.2012.07.010. [DOI] [Google Scholar]
  17. Illendula A, Sheth A (2019) Multimodal emotion classification. In Companion Proceedings of The 2019 World Wide Web Conference, pp 439–449
  18. Jabreel M, Moreno A. A deep learning-based approach for multi-label emotion classification in tweets. Appl Sci. 2019;9(6):1123. doi: 10.3390/app9061123. [DOI] [Google Scholar]
  19. Jung Y, Park H, Myaeng SH (2006) A hybrid mood classification approach for blog text. In Pacific Rim International Conference on Artificial Intelligence, pp 1099–1103
  20. Kamal R, Shah MA, Maple C, Masood M, Wahid A, Mehmood A. Emotion classification and crowd source sensing; a lexicon based approach. IEEE Access. 2019;7:27124–27134. doi: 10.1109/ACCESS.2019.2892624. [DOI] [Google Scholar]
  21. Kleinberg B, van der Vegt I, Mozes M (2020) Measuring emotions in the covid-19 real world worry dataset
  22. Lee CM, Narayanan SS. Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process. 2005;13(2):293–303. doi: 10.1109/TSA.2004.838534. [DOI] [Google Scholar]
  23. Mathur A, Kubde P, Vaidya S (2020) Emotional analysis using twitter data during pandemic situation: COVID-19. In 5th International Conference on Communication and Electronics Systems (ICCES), pp 845–848
  24. Medford RJ, Saleh SN, Sumarsono A, Perl TM, Lehmann CU (2020) An infodemic: leveraging high-volume twitter data to understand public sentiment for the COVID-19 outbreak [DOI] [PMC free article] [PubMed]
  25. Mohammad SM. Emotion measurement. UK: Woodhead Publishing; 2016. Sentiment analysis: Detecting valence, emotions, and other affectual states from text; pp. 201–237. [Google Scholar]
  26. Ohana B, Tierney B. Sentiment classification of reviews using SentiWordNet. Proceed IT & T Conf. 2009;13:18–30. [Google Scholar]
  27. Saeed NM, Helal NA, Badr NL, Gharib TF (2018) The impact of spam reviews on feature-based sentiment analysis. In proceedings of 13th International Conference on Computer Engineering and Systems (ICCES)
  28. Shadravan S, Naji HR, Bardsiri VK. The sailfish optimizer: a novel nature-inspired metaheuristic algorithm for solving constrained engineering optimization problems. Eng Appl Artif Intell. 2019;80:20–34. doi: 10.1016/j.engappai.2019.01.001. [DOI] [Google Scholar]
  29. Stojanovski D, Strezoski G, Madjarov G, Dimitrovski I, Chorbev I. Deep neural network architecture for sentiment analysis and emotion identification of Twitter messages. Multimedia Tools Appl. 2018;77(24):32213–32242. doi: 10.1007/s11042-018-6168-1. [DOI] [Google Scholar]
  30. Tai Y, He H, Zhang W, Jia Y (2018) Automatic generation of review content in specific domain of social network based on RNN. In IEEE Third International Conference on Data Science in Cyberspace (DSC), pp 601-608
  31. Usman M, Latif S, Asim M, Lee BD, Qadir J. Retrospective motion correction in multishot MRI using generative adversarial network. Sci Rep. 2020;10(1):1–11. doi: 10.1038/s41598-020-61705-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1480–1489
  33. Yen HY, Lin PH, Lin R. Emotional product design and perceived brand emotion. Int J Adv Psychol (IJAP) 2014;3(2):59–66. doi: 10.14355/ijap.2014.0302.05. [DOI] [Google Scholar]
  34. Yu J, Marujo L, Jiang J, Karuturi P, Brendel W (2018) Improving multi-label emotion classification via sentiment classification with dual attention transfer network.

Articles from Social Network Analysis and Mining are provided here courtesy of Nature Publishing Group

RESOURCES