Skip to main content
Heliyon logoLink to Heliyon
. 2024 Mar 21;10(9):e27863. doi: 10.1016/j.heliyon.2024.e27863

Sentiment analysis of Arabic social media texts: A machine learning approach to deciphering customer perceptions

Ohud Alsemaree 1,, Atm S Alam 1, Sukhpal Singh Gill 1, Steve Uhlig 1
PMCID: PMC11070797  PMID: 38711635

Abstract

Sentiment analysis (SA) is a subfield of artificial intelligence that entails natural language processing. This has become increasingly significant because it discerns the emotional tone of reviews, categorising them as positive, neutral, or negative. In the highly competitive coffee industry, understanding customer sentiment and perception is paramount for businesses seeking to optimise their product offerings. Traditional methods of market analysis often fall short of capturing the nuanced views of consumers, necessitating a more sophisticated approach to sentiment analysis. This research is motivated by the need for a nuanced understanding of customer sentiments across various coffee products, enabling companies to make informed decisions regarding product promotion, improvement, and discontinuation. However, sentiment analysis faces a challenge when it comes to analysing Arabic text due to the language's extraordinarily complex inflectional and derivational morphology. Consequently, to address this challenge, we have developed a new method designed to improve the precision and effectiveness of Arabic sentiment analysis, specifically focusing on understanding customer opinions about various coffee products on social media platforms like Twitter. We gathered 10,646 various coffee products' Twitter reviews and applied feature extraction techniques using the term frequency-inverse document frequency (TF-IDF) and minimum redundancy maximum relevance (MRMR). Subsequently, we performed sentiment analysis using four supervised learning algorithms: k-nearest neighbor, support vector machine, decision tree, and random forest. All the classification statements derived in the analysis were aggregated via ensemble learning to convey the final results. Our results demonstrated an increase in prediction accuracy, with our method achieving over 95.95% accuracy in the Hard voting and soft voting at 94.51 %.

Keywords: Arabic text, Feature extraction, Machine learning, Sentiment analysis, Social media

1. Introduction

Processing the ever-growing volume of data, such as that obtained from social media data, presents a formidable and time-intensive challenge. The latest statistics reveal that the vast data landscape includes contributions from nearly 4.95 billion social media users, with over half of the global population engaging in social media [1]. This sheer volume of data suggests an increase in the complexity of data analysis, leading to the advent of advanced artificial intelligence (AI) and natural language processing (NLP) tools. These technologies are essential in tackling the intricacies of sentiment analysis (SA), which is instrumental in decoding consumer sentiments regarding various brands, products, and services. SA harnesses feedback from diverse sources, ranging from social media posts to comprehensive market research [2]. Moreover, It supports decision-makers in the realms of policy, entrepreneurship, and professional fields to improve their decision-making procedures [3]. The goal of SA is to examine how individuals feel about objects, as well as their emotions and attitudes towards entities and their attributes, as conveyed in written language [4]. Therefore, the deployment of AI and NLP is not merely beneficial but is necessary for effective sentiment analysis in today's data-rich world. There has been significant progress in research in SA in English text and other widely used languages. Arabic holds the sixth position in the ranking of the most utilized languages on the internet, marking its significant influence in the social media realm. Arabic, which ranked sixth among the most used languages on the internet, is notably influential online [5]. Nevertheless, although research in SA in Arabic has advanced, the language's complexities; such as its reading direction (right to left); diverse dialects, with varying phonology, intricate morphology and syntax, present unique challenges and hinder the development of standard lexicons [6,7]. Three main SA approaches exist; namely: the lexicon-based approach, which relies on a pre-established lexicon of sentiment terms [8]; machine learning, which uses probabilistic models to identify sentiments [2]; and the hybrid approach, combining both [9]. These methods aim to discern sentiment polarity in social media texts and assist companies in understanding consumer views on their products and services in order to facilitate better strategic planning and meet the demands of their customers [10]. Moreover, These applications help businesses identify emerging trends and patterns in customer sentiment over time. By tracking changes in sentiment, businesses can identify shifts in customer preferences, anticipate market trends, and adapt their strategies accordingly. By comparing sentiment scores and feature ratings across similar products or brands, businesses can assess their competitive position and identify opportunities for differentiation. Despite the growing body of research on sentiment analysis using machine learning techniques, there remains a notable gap in applying these advanced approaches specifically to the domain of consumer products, e.g., coffee, whose demand is growing in the Saudi market. This paper seeks to address this gap by demonstrating a highly efficient sentiment prediction model tailored for analysing opinions about coffee products. This will, thus, offer new insights into customer preferences and sentiment trends within this specific market segment. More specifically, this paper aims to highlight the significance of extracting insights from customer perceptions via sentiment analysis using machine learning approaches and demonstrates a highly efficient sentiment prediction model for analyzing opinions about coffee products.

In light of these challenges, and motivated by the abundance of Arabic social media texts, our work employs an ensemble approach which integrates four classifiers: Support Vector Machines (SVM), Decision Trees (DT), Random Forests (RF), and k-nearest Neighbors (KNN). Termed as a customer perception meta-ensemble, this approach is enhanced by optimized feature selection techniques used to analyze customer opinions on coffee products.

Our contributions to SA research in Arabic include.

  • This study introduces a novel ensemble model designed specifically for Arabic Twitter various coffee product reviews, aiming to achieve maximum accuracy in sentiment prediction while ensuring high efficiency.

  • This work aggregates and analyzes a big data dataset comprising 10,646 product reviews. Through meticulous analysis, we categorize feedback into positive, negative, or neutral sentiments, enabling businesses to grasp overall sentiment trends and pinpoint factors driving customer satisfaction or dissatisfaction.

  • Comparative study of feature extraction algorithms: we conduct a comparative investigation of TF-IDF and MRMR feature extraction algorithms to enhance the accuracy and robustness of our sentiment analysis model by evaluating their effectiveness in extracting meaningful features from product reviews.

  • Evaluation of the performance of four baseline classifiers, SVM, DT, RF, and KNN, to refine our model's predictions.

  • The development of our proposed customer perception meta-ensemble model aims for maximum accuracy.

The paper continues with Section 2 which introduces the detailed related work. Section 3 describes the methodology. Section 4 presents the results and discussion. Finally, Section 5 concludes the paper.

2. Related work

In this section, we highlight valuable sentiment analysis studies that use supervised machine learning. Our focus will be on examining customer comments to gain insight into their opinions and perceptions. We will explore different supervised machine learning algorithms and their effectiveness in enhancing model accuracy. Additionally, we will assess the impact of various feature extraction on improving accuracy. Ultimately, our goal is to achieve and enhance accuracy to benefit both the model and the evaluation. Each study will be examined according to the algorithms utilized, the features employed, and the type of data source from which the specific domain of customer feedback was evaluated. Additionally, the type of feature employed and the highest level of accuracy attained in the results will also be considered in the analysis.

Authors in Ref. [11] have provided a solution to Arabic text classifications for SA of Mubasher products based on feedback collected from Saudi Arabian tweets utilizing several techniques, such as Naive Bayes (NB) and SVM. The study explores positive and negative binary classes. While the authors have used two feature extractions, such as Term Frequency Inverse Document Frequency (TF-IDF) and Binary-Term Occurrence (BTO), the results of this study show that NB with TF-IDF is 81.74% and with BTO is 82.70%. SVM with TF-IDF is 89.60%, and with BTO is 88.81%, respectively.

In another similar study carried out by Ref. [12], the researchers focused on the performances of three classifiers; namely: logistic regression (LR), KNN, and DT for SA of Arabic tweets. In their study, the authors employed two feature extraction methods, which were TF-IDF and Binary-Term Occurrence (BTO). However, they only considered two classes (positive or negative). The findings of this research demonstrate their maximum achievable accuracy of 92% by Decision Tree DT.

Some authors have suggested [13], that this study aims to introduce a multi-criteria approach for identifying the most suitable machine learning algorithm for sentiment analysis of Arabic dialects. They conducted tests using Saudi Arabian product reviews and compared the performance of five popular ML classifiers: SVM, KNN, NB, DT, and deep learning. They used TF-IDF and BTO as feature extraction methods, considering only two classes (positive or negative). The results show that deep learning achieved 85.25% accuracy, outperforming SVM, DT, KNN, and NB in terms of accuracy (82.30%).

In another study conducted by Ref. [14], the main objective of this research was to analyze reviews on electronic products on Amazon, and develop an efficient predictive model capable of classifying reviews as positive or negative. During the study, the researchers compared three popular text preprocessing techniques - Term Frequency - Inverse Document Frequency, Bag of Words, and Word2Vec. They also determined the most suitable predictive model using algorithms such as Support Vector Machine, Naïve Bayes, Decision Tree, Random Forest, and Multi-Layer Perceptron. In most cases, the MLP algorithm produced the best classification results, achieving an accuracy of 92 %.

In the same year, a further study was published by Ref. [15]. The main objective of this study was to evaluate the customer reviews of restaurants and cafes in the Qassim region of Saudi Arabia. The study utilized the TF-IDF feature extraction method, with the main emphasis focusing on the distribution of opinion classes, such as positive and negative. Among these algorithms, when comparing five different classifiers - Support Vector Machine, logistic regression, K-Nearest Neighbors, Naïve Bayes, and Random Forest - SVM achieved the highest values of accuracy at 89 %, outperforming the other classifiers.

There is a consensus among several studies, such as those conducted by Refs. [12,13], and [14] on the need for using feature extraction in SA. They have highlighted that opting for the utilization of TF-IDF and BOW has the potential to produce better performance. On the other hand, the study by Ref. [15] employed three different types of feature extractors; namely: TF-IDF, BOW, and Word2Vec, which led to an improvement in accuracy. However, the current study diverges from the aforementioned studies by utilizing two types of features (TF-IDF, MRMR), and the findings exhibited an enhancement in accuracy. A summary of the comparative overview of the related works discussed above is included in Table 1 below.

Table 1.

Comprehensive comparative analysis of related works in sentiment analysis for the Arabic language.

No Algorithms Data Source Features Best Result
[11] NB Mubasher Software Products TF-IDF NB 88.81%
SVM BOW
[12] LR Different Sources TF-IDF DT 92%
KNN BOW
DT
[13] SVM Corpus of Saudi Tweets TF-IDF SVM85.25%
KNN
NB
DT BOW
Deep Learning
[14] SVM Amazon Prime TF-IDF MLP 92.00%
NB BOW
DT
LR W2Vec
RF
MLP
[15] SVM Customer reviews from cafes and restaurant TF-IDF SVM89%
LR
KNN
NB
RF
DT
NB
This Study KNN Reviews Customer Perceptions (coffee) TF-IDF SVM 94.95%
SVM
DT MRMR
RF

3. Methodology

The proposed SA model was constructed on the basis of machine learning, and its process focuses on the sentence level. This research aimed to shed light on Arabic language customers’ perceptions, as reflected in their tweets, to contribute to the prediction of increased or decreased product purchases. Our results should also help companies and organisations make appropriate decisions on whether to raise or lower purchase orders. The model involves eight steps: (1) the collection of customer reviews on social media (Twitter); (2) the annotation of the collected data that are manually annotated; (3) the preprocessing is performed to clean the dataset before feature extraction is carried out using (4) TF-IDF and (5) MRMR. The process then moves onto (6) the classification of customer perception using four machine learning algorithms (KNN, SVM, DT, RF), followed by (7) the ensemble learning, which enables the integration of multiple of MLs. The process concludes with (8) the acquisition of results. Fig. 1 presents the proposed framework for text-mining of Twitter data that was utilized for the performing SA to understand customers perceptions. Below, we discuss the details of the task and the various stages of the proposed framework.

Fig. 1.

Fig. 1

The workflow of the proposed method Customer perception.

To begin with, we formulate the task that we undertake in this work as follows. Given a tweet post that is a sequence of tokens. The goal of the customer perception task we tackle is to predict the sentiment l that correctly captures and or reflects the feelings or opinions of the author of the post wherein (positive, negative, neutral).

3.1. Data collection

In our study, we have used Twitter for collecting data on coffee products as Twitter is a popular social media platform in Saudi Arabia. Approximately 10,646 tweets were collected at the sentence level using the Application Programming Interface (API). The collected tweets are mainly about different coffee providers in the Saudi market by using hashtags keywords for different labels of coffee companies such as Dunkin Coffee, Saudi Costa, Starbucks Coffee, Barneys Coffee, Tim Hortons Coffee, half a Million Coffee, dr. cafe coffee, overdose Coffee.

3.2. Data annotation

The collected tweets were annotated with respect to three sentiment classifications: positive, negative, and neutral perspectives. Annotation can be carried out both manually, in crowdsourcing, and automatically [16]. The process of assigning labels to the data required careful consideration and subjective interpretation on the basis of the goals and objectives of this study. This study considered manual annotation involving defining sentiments, with the context of interest as guidance, and conducting manual labelling. The resultant annotated dataset contained 3107 positive tweets, 2945 negative tweets, and 3064 neutral tweets, as shown in Table 2.

Table 2.

Statistics of the manually annotated.

3.2.

3.3. Pre-processing

The pre-processing task plays an important role in NLP and it is performed to remove useless information. It also aids in noise removal and improves data quality. This step is aimed at enhancing the ease, effectiveness, and accuracy with which a model outputs textual data. The annotated tweets are subjected to rigorous pre-processing to clean them up in preparation for training and evaluation. The pre-processing conducted is as follows.

3.3.1. Data cleaning

In this step, we removed noise and useless details from the data, specifically through the following procedures.

  • Hashtag removal: all words that begin with a hashtag (#) were removed.

  • Punctuation and number removal: tweets that included punctuation marks and numbers were removed and excluded from the determination of sentiments.

  • Diacritic removal: diacritics that do not affect SA measurements were eliminated.

  • Removal of duplicate tweets: duplicate tweets were removed to prevent influence on SA measurement.

  • URL removal: URLs contain no useful detailed information that affects SA measurement, so these were excluded.

3.3.2. Filtering of stop words

Stop words are a collection of words and characters that are frequently repeated in Arabic and English texts. Examples are.

(في ، علي ، ال، هم، هي، هو).

3.3.3. Normalization

Normalization refers to substituting similar characters that can be interchangeably used [16].

They include Examples are (أ, ا - ة, ه - ى, ي -, ا - إ, ا), which is shown in Fig. 2 as a summary of our pre-processing. Also, the statistics of the datasets are presented in Table 3.

Fig. 2.

Fig. 2

Summary of our pre-processing methods.

Table 3.

Dataset statistics before and after processing.

Class Tweets Before Processing Tweets After Processing
Positive (+1) 4912 3107
Negative (−1) 3461 2945
Neutral (0) 2273 3064

3.4. Feature extraction

Before data are provided to train algorithms, a crucial task is to perform feature extraction, which significantly affects the effectiveness of SA classification. We conducted feature extraction to identify words relevant to the analysis of customer perceptions and feelings for the purpose of understanding these individuals’ views. We employed several algorithms to effectively select a subset of words. This process helps reduce noise in sentences that need to be classified [17]. Specifically, we used TF-IDF and MRMR.

3.4.1. Term frequency-inverse document frequency (TF-IDF)

After preprocessing, each tweet was converted into a vector containing TF-IDF scores corresponding to words in the tweet. TF-IDF is a popular method of evaluating the importance of a word in a document. It is a statistical, numerical tool that detects important words in a sentence on the basis of frequency of occurrence [18]. TF-IDF comprises two components: the first is term frequency (TF), and the second is inverse document frequency (IDF) [19]. The TF-IDF of a word ‘w’ in a processed tweet ‘s’ within the corpus of processed tweets ‘D’ is denoted as follows: (w, s, D}. TF-IDF is calculated as shown in (1):

TFIDF(w,s,D)=tf(w,s)*log(Ndfω) (1)

where tf(w,s) is the frequency of w in processed tweet s, N is the total number of processed tweets in corpus D and dfω is the number of documents that contain w.

3.4.2. Minimum redundancy and maximum relevance (MRMR)

MRMR, which is an algorithm that ranks features based, refers to minimum redundancy - maximum relevance. The MRMR feature selection framework was proposed by Peng et al. [20]. It is underlain by the goal of identifying a subset of features (words or terms) that are both highly relevant to the task at hand (maximum relevance) and minimally redundant with one another (minimum redundancy). We used this algorithm to understand customer perceptions via review texts, from which we obtained features of maximum relevance. This helped us improve the efficiency and effectiveness of our model, as a set of features that collectively provided relevant information was selected without unnecessary redundancy. For categorical variables, classification (positive, negative, neutral) is based on mutual information (MI), as was the case in this study. The MI concept is used to measure the level of similarity and correlation between random variables [21]. It can measure the quantity of information between feature variables X and Y. In the case of feature selection, MRMR (MI) helps identify attributes that are informative about a target variable from a dataset.To begin with, we compute the Mutual Information (MI) between each word in a given tweet and the corresponding label using the marginal probabilities as shown in (2) below:

I(A,B)=aA,bbp(a,b)log(p(a,b))p(a)p(b) (2)

where a is a word and b is the target label, and p(a) is a marginal probability of a and p(a,b) is the joint probability of a and b.

The relevance measure utilizes the value of MI between the features. For example, if the value of MI computed in (2) is small, it indicates that there is a weak correlation between features a and b, and the vice versa holds true. Therefore, it is necessary to select the maximum value of MI between the features and the target class c, the maximal relevance criterion can be expressed as shown in (3) below:

maxD(X,c)=1[X]xiXI(xi:c) (3)

where MI is computed between each feature xi and the target class c. However, selecting features according to the maximum relevance criterion can bring a large amount of redundancy. Therefore, a minimum redundancy criterion is introduced as shown in (4) below:

minR(X)=1[X]xiXI(xi:xi) (4)

3.5. Classification of supervised machine learning

There are various forms of machine learning classifiers that can be used to classify textual data [22]. This research focused specifically on categorization, which guided the SA classification of Arabic texts. The preprocessed dataset was used to train four classification algorithms: three nonlinear classifiers (KNN, DT, RF) and a linear classifier (SVM).

3.5.1. K-nearest neighbors (KNN)

KNN is a supervised machine learning algorithm that carries out classification by first calculating the distance between training and test sets before using this information to find the data's closest neighbors [23].

3.5.2. Support vector machine (SVM)

SVM is a crucial technique for classifying traditional text [24]. One type of supervised machine learning algorithm is deployed to classify linear and nonlinear data and solve regression and classification problems. Classification predicts a label, whereas regression predicts a continuous value [25]. We performed the method of classification for the dataset.

3.5.3. Decision tree (DT)

DT is used to represent choices and their subsequent results in the form of graphs [26]. They are employed as predictive models that conduct observations of an item in branches and then make predictions about the target value of the item in leaves [27].

3.5.4. Random forest (RF)

RF is a supervised machine learning algorithm that also falls under ensemble learning. It builds a multitude of decision trees at the time of learning. The classification of each tree is voted on, after which categorization is performed by a majority vote [28].

3.5.5. Customer perception meta ensemble

In this work we propose the customer perception Meta ensemble. It is premised on ensemble learning methods, where several basis learners are trained, and their predictions are combined to generate results that are superior to those derived via the individual base learners. Each of the individual models (KNN, SVM, DT and RF) [29] were trained separately on our training set. We built our customer perception Meta ensemble as a model that combined predictions of the individual models on the data points in the test set. We recorded two different predictions for every single data point in the test set. Firstly, the majority vote-based prediction, also commonly referred to as hard voting; i.e. the most predicted class, is returned as the prediction. Secondly, the soft-voting prediction which computes the average of the predicted probabilities across the models and then the classes them with the highest probability, is the returned prediction [30,31]. This computation across all instances in the test set leads to the results, as reported in Table 5.

Table 5.

Model performance using ensemble learning.

Class Matric Classification Data Results
Accuracy Precision Recall F1
Best Baseline classifier (SVM) −1 95.0% 94.0 95.0 96.0
0 95.0 95.0 95.0
+1 96.0 95.0 95.0
Customer perception meta ensemble (Hard) −1 95.95 % 94.96 96.92 95.75
0 93.60 95.4 94.5
+1 94.19 96.37 95.27
Customer perception meta ensemble (Soft) −1 94.51 % 94.5 94.5 94.5
0 94.5 94.5 94.5
+1 94.5 94.5 94.5

3.6. Evaluation metrics

In this section, we discuss the methods that we employed to evaluate our model's efficiency and draw conclusions regarding the performance of the various algorithms used in this work. Four metrics constituted this evaluation: accuracy, precision, recall and the F-score [27]. We applied four distinct assessment indicators and subjected four categories to the performance evaluation.

  • 1)

    True Positive = (TPos)

  • 2)

    False Positive = (FPos)

  • 3)

    True Negative = (TNeg)

  • 4)

    False Negative = (FNeg)

Accuracy: The accuracy (Acc) measurement is the correctly predicted comments divided by the total Reviews as shown in (5).

Accuracy=TPos+TNegTPos+FPos+FNeg+TNeg (5)

Precision: Precision (P) measurement is calculated as the number of positive comments divided by the number of truly positive comments as shown in (6).

Precision=TPosTPos+FPos (6)

Recall: Recall (R) measurement is the percentage of comments classified as positive comments to all reviews in the specific class as shown in (7).

Recall=TPosTPos+FNeg (7)

F1 - Score: F1- Score (F-1) measurement is a measure of the accuracy of the test then it is divided into the recall and precision as shown in (8).

F1Score=2XRecallXPrecisionRecall+Precision (8)

4. Results and discussion

This section discusses the evaluation results and the model's performance. To obtain the results, we used a Python 3 program that incorporates various libraries to process an entire execution. We randomly assigned 80% of the data mixing of all three (positive, negative and neutral) sentiment categories to the training set and 20% to the testing set.

4.1. Feature extraction

We then applied TF-IDF and MRMR feature extraction. Using TF-IDF, each processed tweet was converted into a vector containing TF-IDF scores corresponding to the words in the tweet. Similarly, we use the same method with MRMR scores computed for the same features (words) within a tweet. The top k-ranked words were obtained by arranging the MRMR scores in descending order, and the TF-IDF vector with scores corresponding to the top k vectors was retained and used to train four machine learning models. The experimental results are shown in Fig. 3.

Fig. 3.

Fig. 3

The impact of the use of the TF-IDF and MRMR.

The accuracy of feature extraction and classification techniques was evaluated both before and after applying feature extraction methods feature extraction (TF-IDF and MRMR). Fig. 3 illustrates the accuracy achieved through these methods. It was found that feature extraction significantly enhanced the accuracy of the model, making it a suitable approach for the classification task. Among the four classifiers tested, the SVM achieved the highest accuracy of 95% after using the feature extraction methods (TF-IDF and MRMR). Additionally, The utilization of Term Frequency-Inverse Document Frequency (TF-IDF) and Minimum Redundancy Maximum Relevance (MRMR) as feature extraction techniques significantly enhances the efficiency and accuracy of predictive models. These methods not only facilitate a deeper comprehension of the data's fundamental structure by highlighting critical features but also support more strategic decision-making processes. The adaptability and versatility of TF-IDF and MRMR enable their application across a broad spectrum of domains. This is especially true in the context of analyzing customer perceptions from Arabic text reviews on coffee products, where these techniques efficiently distill complex data into meaningful insights. By extracting the most relevant features, TF-IDF and MRMR contribute to a nuanced understanding of customer sentiments, preferences, and trends, underscoring their value in extracting actionable intelligence from diverse data sources.

4.2. Classification of sentiment analysis

Table 4 and Fig. 4 results show the applying of all four supervised machine learning models (KNN, SVM, DT, RF) for sentiment analysis to understand customer perception with metrics using a confusion matrix of the four evaluations (Accuracy, Precision, Recall, F1-Score). It gave the highest accuracy. SVM and DT achieved high results of 95%, and RF achieved the closest result of 94%, while KNN achieved the lowest result above the rest at 74.0%.

Table 4.

Comparative sentiment analysis of each classifier.

Algorithms Class Accuracy Precision Recall F1
KNN −1
0
+1
74.0% 74.0
74.0
73.0
72.0
71.0
79.0
73.0
72.0
76.0
SVM −1
0
+1
95.0% 97.0
94.0
95.0
95.0
95.0
95.0
95.0
96.0
95.0
DT −1
0
+1
95.0% 96.0
93.0
93.0
95.0
93.0
95.0
95.0
93.0
94.0
RF −1
0
+1
94.0% 97.0
93.0
96.0
95.0
95.0
95.0
96.0
94.0
95.0

Fig. 4.

Fig. 4

Comparison of different classifiers based on Accuracy and Recall.

Finally, we compare and assess the performance of our proposed ensemble model with the baseline classifiers. The results of the experiments are presented in Table 5, where we observe the proposed meta-ensemble model (Hard voting) achieve the best all-around scores across all four metrics. We observe Hard voting outperforms the best baseline classifier (SVM) by 95.0%. The baseline is however observed to be competitive i.e. outperforming the soft voting by 94.51%. Overall, we can firmly conclude that an ensemble setup has led to an improvement in the performance of the customer perception SA task. Employing ensemble methods that integrate algorithms like k-nearest neighbors (KNN), support vector machines (SVM), decision trees (DT), and random forests (RF) has emerged as a highly effective approach for discerning customer perceptions from Arabic text reviews on coffee products. By combining the predictive capabilities of various machine learning algorithms, these ensemble strategies significantly improve sentiment classification accuracy. This collective approach not only minimizes errors through the consolidation of predictions, resulting in more accurate outcomes but also adeptly addresses data imbalances using techniques like boosting or bagging. The ability to enhance classification precision across a range of sentiment categories underscores the value of ensemble methods, demonstrating their effectiveness in overcoming the inherent complexities of Arabic sentiment analysis. Also, by employing four diverse algorithms, the ensemble method can mitigate the weaknesses of any single model, leading to improved overall performance.

In Table 5, The final analysis revealed that the Best Baseline classifier Support Vector Machine (SVM) algorithm outperformed other models, such as Decision Trees (DT), Random Forests (RF), and k-nearest Neighbors (KNN) in sentiment analysis of Arabic text reviews on coffee products. SVM's adeptness at managing high-dimensional, sparse, and intricate data structures, combined with its resistance to overfitting and its proficiency in modelling nonlinear relationships, positions it as the premier algorithm for discerning customer sentiments from Arabic reviews. These inherent strengths of SVM significantly enhance its accuracy in sentiment classification, ensuring a superior performance over DT, RF, and KNN within this specific analysis framework.

5. Conclusion

We proposed a new approach for improving the accuracy of an Arabic SA method based on machine learning models. We constructed a dataset containing 10,646 coffee product reviews acquired from social media (Twitter). Then, the data were manually annotated before they were preprocessed to eliminate elements such as stop words and non-normalized characters. We applied feature extraction using TF-IDF and MRMR and subsequently implemented KNN, SVM, DT, and RF algorithms. Ensemble learning was carried out to combine the classification expressions generated using the algorithms and derive the results. The experimental findings showed that during testing, the proposed approach achieved a predictive accuracy of 95.95% in hard voting and a comparable accuracy of 94.51% in soft voting. Through the experimental results, it is shown that using the features is beneficial in improving accuracy and helps increase the performance of our model. Moreover, By understanding customer sentiment and preferences gleaned from product reviews, businesses can uncover valuable opportunities for innovation. Our study emphasizes the importance of leveraging customer feedback to identify areas for product enhancement, prioritize feature development, and tailor offerings to better align with customer needs and desires. This work contributes to the effective prioritization of feedback based on sentiment analysis results. By focusing on reviews expressing negative sentiment or highlighting areas for improvement, businesses can allocate resources more strategically and proactively address customer concerns. This proactive approach not only enhances customer satisfaction but also fosters stronger customer relationships and loyalty in the long term. This work offers a promising technique for forecasting the popularity of coffee products by analyzing customer tweets about coffee advertisements. Moreover, its accuracy in sentiment analysis also positions it as a valuable tool for forecasting market sentiment trends. With its ability to extract actionable insights from textual data, our model emerges as a versatile and powerful asset for businesses seeking to make informed decisions and stay ahead in competitive markets. The SA of Arabic texts is a vast area of research. Whilst this research supported achieving an improved level of accuracy in terms of using SA for Arabic text, there are a number of areas for improvement that can be applied in further studies. For instance, looking into products that embed many specifications, such as body care or beauty products. Future studies can investigate adopting other methods that fall under deep learning, such as the use of CNN or ANN.

Data availability

The data of this study are available upon request.

CRediT authorship contribution statement

Ohud Alsemaree: Writing – original draft, Visualization, Software, Methodology, Formal analysis, Data curation. Atm S. Alam: Writing – review & editing, Writing – original draft, Conceptualization, Formal analysis, Investigation, Methodology, Supervision. Sukhpal Singh Gill: Writing – review & editing, Writing – original draft, Conceptualization, Formal analysis, Investigation, Methodology, Supervision. Steve Uhlig: Writing – review & editing, Writing – original draft, Conceptualization, Formal analysis, Investigation, Methodology, Supervision.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

Ohud Alsemaree would express her thanks to the Saudi Arabia Cultural Mission and Umm Al-Qura University for their support and funding Supporting.

References

  • 1.Number of internet and social media users worldwide as of October. 2023. https://www.statista.com/statistics/617136/digital-population-worldwide/
  • 2.Sandra C M., Netzer Oded. Using big data as a window into consumers' psychology. Current Opinion in Behavioral Sciences. Dec 2017;18:7–12. [Google Scholar]
  • 3.Cui J., Wang Z., Beng Ho S., Cambria E. Survey on sentiment analysis: evolution of research methods and topics. Artif. Intell. Rev. 2023;56(8):8469–8510. doi: 10.1007/s10462-022-10386-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bing L. Cambridge University; Cambridge, UK: 2015. Sentiment Analysis: Mining Sentiments, Opinions, and Emotions. [Google Scholar]
  • 5.The most spoken languages worldwide in. 2023. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/ [online] Available:
  • 6.Shaalan H., Siddiqui S., Alkhatib M., Abdel Monem A. Computational Linguistics, Speech and Image Processing for Arabic Language. World Scientific; 2019. Challenges in Arabic natural language processing. [Google Scholar]
  • 7.P. H T., Tran V.C., Nguyen N.T., Hwang D. Improving the performance of sentiment analysis of tweets containing fuzzy sentiment using the feature ensemble model. IEEE Access. 2020;8:14630–14641. [Google Scholar]
  • 8.Krosuri Lakshmi R. "Fine-Grained Sentiment Analysis on Online Reviews." 2023 Third International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) IEEE; 2023. Aravapalli Rama Satish, and popuri Srinivasa Rao. [Google Scholar]
  • 9.Di W., et al. A hybrid system with filter approach and multiple population genetic algorithm for feature selection in credit scoring. J. Comput. Appl. Math. Feb 2018;329:307–321. [Google Scholar]
  • 10.Shayaa S., et al. Sentiment analysis of big data: methods, applications, and open challenges. IEEE Access. 2018;6:37807–37827. [Google Scholar]
  • 11.Al-Rubaiee H., Qiu R., Li D. 2016 International Conference on Industrial Informatics and Computer Systems (CIICS), Sharjah. United Arab Emirates; 2016. Identifying Mubasher software products through sentiment analysis of Arabic tweets; pp. 1–6. [Google Scholar]
  • 12.Bolbol N.K., Maghari A.Y. 2020 International Conference on Promising Electronic Technologies. ICPET); Jerusalem, Palestine: 2020. Sentiment analysis of Arabic tweets using supervised machine learning; pp. 89–93. [Google Scholar]
  • 13.Abo M.E.M., et al. A multi-criteria approach for Arabic dialect sentiment analysis for online reviews: exploiting optimal machine learning algorithm selection. Sustainability. Sep. 2021;13(18) [Google Scholar]
  • 14.Hawlader M., Ghosh A., Raad Z.K., Chowdhury W.A., Shehan M.S.H., Ashraf F.B. 2021 International Conference on Electronics, Communications and Information Technology (ICECIT) Khulna, Bangladesh; 2021. Amazon product reviews: sentiment analysis using supervised learning algorithms; pp. 1–6. [Google Scholar]
  • 15.Alharbi L.M., Qamar A.M. 2021 National Computing Colleges Conference (NCCC), Taif. Saudi Arabia; 2021. Arabic sentiment analysis of eateries' reviews: Qassim region case study; pp. 1–6. [Google Scholar]
  • 16.Marquez B., E. Frank F., Pfahringer B. Building a Twitter opinion lexicon from automatically-annotated tweets. Knowl. Base Syst. 2016;108:65–78. [Google Scholar]
  • 17.Alruily M., Shahin O.R. Sentiment analysis of Twitter data for Saudi universities. International Journal of Machine Learning and Computing. 2020;10(No. 1) January 2020. [Google Scholar]
  • 18.Canedo V., Maroño N., Betanzos A. A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 2013;34:483–519. [Google Scholar]
  • 19.Nazir M.K., Ahmad M., Ahmad H., Abdul Qayum M., Shahid M., Habib M.A. ICOSST); Lahore, Pakistan: 2020. Sentiment Analysis of User Reviews about Hotel in Roman Urdu," 2020 14th International Conference On Open Source Systems And Technologies; pp. 1–5. [Google Scholar]
  • 20.Saravanan R., Babu M. Enhanced text mining approach based on ontology for clustering research project selection. J. Ambient Intell. Hum. Comput. Dec 2017:1–11. [Google Scholar]
  • 21.Ding C., Peng H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 2005;3(2):185–205. doi: 10.1142/s0219720005001004. 2005. [DOI] [PubMed] [Google Scholar]
  • 22.Fang H., Tang P., Si H. Feature selections using minimal redundancy maximal relevance algorithm for human activity recognition in smart home environments. Journal of Healthcare Engineering. 2020:1–13. 2020. [Google Scholar]
  • 23.Touahri I., Mazroui A. Enhancement of a multi-dialectal sentiment analysis system by the detection of the implied sarcastic features. Knowl. Base Syst. 2021;227 [Google Scholar]
  • 24.Hicham N., Karim S., Habbat N. 2022 5th International Conference on Advanced Communication Technologies and Networking (CommNet) Marrakech, Morocco; 2022. An efficient approach for improving customer Sentiment Analysis in the Arabic language using an Ensemble machine learning technique; pp. 1–6. [Google Scholar]
  • 25.Alzyout M., AL Bashabsheh E., Najadat H., Alaiad A. 2021 12th International Conference on Information and Communication Systems. ICICS); Valencia, Spain: 2021. Sentiment analysis of Arabic tweets about violence against women using machine learning; pp. 171–176. [Google Scholar]
  • 26.Srivastava, Soni V. Kumar. 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE) Greater Noida; India: 2022. A systematic review on sentiment analysis approaches; pp. 1–6. 2022. [Google Scholar]
  • 27.Rathi M., Malik A., Varshney D., Sharma R., Mendiratta S. 2018 Eleventh International Conference on Contemporary Computing (IC3) Noida; India: 2018. Sentiment analysis of tweets using machine learning approach; pp. 1–3. [Google Scholar]
  • 28.I. D. Mienye and Y. Sun, "A survey of ensemble learning: concepts, algorithms, applications, and prospects,"in IEEE Access, vol. 10, pp. 99129-99149..
  • 29.Al-Hashedi A., Al-Fuhaidi B., Mohsen A., Ali Y., Al-Kaf H., Al-Sorori W., Maqtary N. Ensemble classifiers for Arabic sentiment analysis of social network (twitter data) towards COVID-19-related conspiracy theories. Appl. Comput. Intell. Soft Comput. 2022:1–10. Article ID 6614730. [Google Scholar]
  • 30.Karthika P., Murugeswari R., Manoranjithem R. 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS) Tamilnadu, India; 2019. Sentiment analysis of social media network using random forest algorithm; pp. 1–5. [Google Scholar]
  • 31.Hicham N., Karim S., Habbat N. Customer sentiment analysis for Arabic social media using a novel ensemble machine learning approach. Int. J. Electr. Comput. Eng. 2023;13(No):4504–4515. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data of this study are available upon request.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES