Abstract
Sarcasm is a form of sentiment often used for comedic effect. Its widespread use contributes to frequent misinterpretation of humour-based comments among native Bengali speakers. The growing prevalence of sarcasm in the Bengali language necessitates further study using natural language processing, as detecting Bengali sarcasm remains particularly challenging. To address this, the study introduces BanglaSarc3, a ternary-class dataset comprising 12,089 Facebook comments categorised as sarcastic with 4012 instances, neutral with 4056 instances, and non-sarcastic with 4021 instances. This dataset aims to tackle humour misinterpretation, which often leads to digital conflicts, providing a valuable resource for improving sarcasm detection in Bengali NLP research. This dataset serves as a benchmark for evaluating NLP models on Bengali sarcasm classification, fostering linguistic diversity and inclusive language models while ensuring balanced category representation. To enhance data quality, pre-processing steps such as anonymisation and duplicate removal were applied. Three native Bengali speakers independently assessed the text labels, ensuring reliability. Designed to advance NLP research, BanglaSarc3 supports applications in sarcasm detection, sarcastic text classification, language modelling, and education. By providing a robust foundation for temporal analysis in Bengali, it enhances the development of precise, context-aware NLP models. The dataset is openly available for academic and research purposes, promoting collaboration and innovation within the Bengali NLP community.
Keywords: Bengali sarcasm detection, Bengali language, Bengali natural language processing, Humour Interpretation, Contextual Understanding
Specifications Table
| Subject | Computer Sciences |
| Specific subject area | Machine Learning, Natural Language Processing, Bengali Text Classification, Bengali Sarcasm Classification. |
| Type of data | Text Files (xlsx-formatted) |
| Data collection | The BanglaSarc3 dataset was compiled from various Bengali sources, primarily from Facebook pages dedicated to Bengali sarcasm. Native Bengali speakers carefully curated this selection of sarcastic comments, ensuring the integrity of all labels. The dataset comprises 12,089 manually selected entries categorised into three classes: Sarcastic (4012), Neutral (4056), and Non-sarcastic (4021). To ensure reliability and maintain high accuracy, native Bengali speakers meticulously rechecked the dataset, making it a valuable resource for Bengali NLP research. |
| Data source location | Platform: Facebook Content Type: Humour-Based Bengali Facebook Posts and Comment Section Institution: Daffodil International University, Individual Home-Lab City: Birulia-Savar, Dhaka Country: Bangladesh |
| Data accessibility | Repository name: Mendeley Data Data identification number: 10.17632/7tn76wdhsr.1 Direct URL to data: https://data.mendeley.com/datasets/7tn76wdhsr/1 |
| Related research article | None |
1. Value of the Data
-
•
By categorising sentences into three classes (Sarcastic, Non-Sarcastic, and Neutral), the dataset aids in the creation of classifiers that can accurately detect sarcasm in a given sentence. This enhances the performance of text classification systems, making them more precise in tasks that require nuanced emotional and contextual interpretation.
-
•
The dataset improves language models by providing the necessary data to learn the complexities of sarcastic expressions in Bengali. This results in more accurate and context-aware language models, which are beneficial for applications such as sentiment analysis, chatbot interactions, and content moderation.
-
•
For linguists and language researchers, BanglaSarc3 offers a rich resource for studying the pragmatic and cultural aspects of sarcasm in Bengali. It allows for in-depth analysis and comparison of sarcasm usage patterns, contributing to the broader field of computational linguistics and sociolinguistics.
-
•
The dataset is a valuable educational resource for teaching and learning about sarcasm detection and understanding in Bengali. It can be integrated into language learning applications and educational platforms to provide learners with practical examples and exercises, enhancing cultural and contextual awareness.
-
•
By setting a benchmark for sarcasm detection in Bengali, BanglaSarc3 encourages the development of similar datasets for other languages, fostering a more inclusive and comprehensive approach to NLP research across different linguistic and cultural contexts.
2. Background
This dataset was initially created to resolve the lack of annotated resources for sarcasm detection within the Bengali language. Sarcasm is a complex challenge for NLP because it involves verbal irony [1] that changes the expressed words from their literal meaning to the intended interpretation. Researchers have extensively investigated sarcasm detection in English and other popular languages, yet Bengali suffers from an insufficient study in this field. An inadequate number of structured datasets hinders research into machine learning technologies aimed at accurately detecting sarcasm in Bengali language text. Bengali, which is spoken by more than 265 million people [2], has intricate grammatical systems and phrase changes that create challenges in detecting sarcasm [3]. The dataset contains meticulously selected content from social media platforms to provide complete coverage of sarcasm, non-sarcasm, and neutral speech patterns. The dataset underwent annotation through native Bengali speakers to guarantee both consistency and reliability in the process. The dataset establishes a basic tool that accelerates Bengali NLP research as well as improves sarcasm detection methods. Benchmark models become possible through the resource, and research opportunities are opened for both language studies and computational analysis in Bengali sarcasm detection.
3. Data Description
As of 2025, Bengali ranks among the top five most-spoken languages worldwide, with over 278 million speakers. Despite its extensive reach, it remains significantly underrepresented in machine learning and natural language processing (NLP) research. Unlike other languages, Bengali exhibits unique nuances in sarcasm, deeply intertwined with the emotional expressions of native speakers. Its intricate grammatical structure, coupled with the rich use of native phrases and contextual phrasing, presents a complex semantic landscape. This complexity, rooted in its morphological and syntactical features, poses significant challenges for NLP applications, including text classification, machine translation, and sarcasm detection. To overcome this challenge, the study developed the “BanglaSarc3” dataset, comprising a carefully curated set of 12,089 sentences tailored specifically for this study. The dataset is provided in the repository as the raw data file “BanglaSarc3 (Original).xlsx”
The BanglaSarc3 dataset was created by sourcing humor-based comments made by native Bengali Facebook users. The crucial aspect of this dataset creation was to make a balanced distribution of each class, consisting of Sarcastic, Non-Sarcastic, and Neutral Bengali sentences. Introducing a ternary class dataset was crucial for this study as previous studies regarding this specify that it's often hard for a person to differentiate between Sarcastic and Non-Sarcastic comments [1]. A dedicated neutral class seemed a very necessary thing to address regarding this study of sarcasm. The sarcastic nature of Bengali speakers is nested with their emotional premises, which affects the usage of sarcastic and insulting remarks. A majority of previous studies favored using two classes of Bengali sarcastic data, not including a neutral class to distinguish the usage of non-inflicting remarks among the non-sarcastic comments.
In this study, 3 classes of sophisticated Bengali comments were created, which were labelled as Sarcastic, Non-Sarcastic and Neutral. Defining them as: a) Sarcastic: Comments that sound offensive or harsh but are meant humorously or playfully. b) Non-sarcastic: Bluntly harmful or offensive remarks with no hidden meaning. c) Neutral: Ordinary, non-emotional statements without sarcasm or harm.
The following section provides a comprehensive overview of the variables within the BanglaSarc3 dataset, enabling researchers to maximize its potential for advancing NLP applications in the Bengali language. As shown in Table 1, the dataset's structure outlines the various components and their corresponding annotations, supporting its use in research and development. Fig. 1 depicts the class distribution among the three distinct categories: 4012 sentences (33.18 %) representing sarcastic remarks, 4056 sentences (33.55 %) representing neutral remarks, and 4021 sentences (33.25 %) in the Non-Sarcastic class.
Table 1.
Dataset description with attributes and possible values.
| Attribute | Description and Possible Values |
|---|---|
| Comments | This attribute represents the original text in Bengali script sourced from Facebook. Each Bengali text represents the remark left by native Bengali speakers, which was manually sourced and annotated. Example: ১৪ই ফেব্রুয়ারী কে ৪৪ জানুয়ারী বানিয়ে দেন,এটাই একমাত্র নিবৃত্তি (Make 14th February 44th January, this is the only cure) মানুষ এক সময় ভাববে খাল কেটে কুমির এনেছি। কারণ প্রযুক্তির কাছে সবাই নিরুপায় হয়ে যাবে।(One day people will think that they cut the canal and brought crocodiles. Because everyone will become helpless to technology.) পারিবারিক শিষ্টাচার ও সামাজিক মূল্যবোধের অবক্ষয়।(Deterioration of family etiquette and social values.) |
| Label | Sarcastic Neutral Non-Sarcastic |
Fig. 1.
The class distribution of each category of Sarcastic Remarks.
Fig. 2 demonstrates the length distribution of the BanglaSarc3 dataset, which contains three classes of sarcastic Bengali texts into Sarcastic, Neutral, and Non-Sarcastic labels. In Sarcastic scripts, there are 4012 instances averaging around the length of 61.72 characters and a standard deviation of 40.74 characters, with 50 % falling between 36 and 52 characters and a median length of 76 characters. These classes of sentences range from 8 to 861 characters. The Neutral class consists of 4056 instances, averaging 85.19 characters in length, with a standard deviation of 53.29 characters, where 50 % fall between 51 and 72 and a median length of 104 characters. The range of sentences in the Neutral class is from 8 to 507. Finally, the Non-Sarcastic class has 4021 instances, averaging around the length of 90.42 characters, with a standard deviation of 79.14 characters, where 50 % fall between 43 and 68 characters and a median value of 108 characters. Instances in this class have a range of 4 to 1000 in character length.
Fig. 2.
Frequency Distribution of Bengali comments for a) Sarcastic, b) Neutral, and c) Non-Sarcastic classes.
Overall, the BanglaSarc3 dataset shows the consistent and balanced distribution of Sarcastic, Neutral, and Non-Sarcastic Bengali comments. The Neutral sentence has a comparatively longer text length, showing the emotion behind the native Bengali speakers to express their thoughts briefly. The variation in sentence lengths is consistent across different categories, with the Non-Sarcastic sentences exhibiting the longest maximum sentence length. This detailed statistical analysis aids in comprehending the dataset's features, which is essential for building and validating models for Bengali NLP applications.
Table 2 displays the 20 most common words in Sarcastic, Neutral, and Non-Sarcastic sentences from the BamglaSarc3 dataset, along with their respective frequencies. The word ``ভাই''(Brother) appears most frequently in sarcastic sentences (350 times), underscoring its prevalent use in describing the informal addressing of Bengalis, representing the friendly nature of Bengali Culture. In neutral sentences, ``না''(No) is the most prominent (371 times), representing the declination of a certain opinion. For non-sarcastic, ``না''(No) tops the list with 488 occurrences, also showing how often Bengalis defend themselves when anyone doesn’t match their beliefs and opinions.
Table 2.
Most Common 20 in Sarcastic, Neutral, and Non-Sarcastic classes with their respective frequency.
| Sarcastic | Neutral | Non-Sarcastic | ||||||
|---|---|---|---|---|---|---|---|---|
| Bengali Word | English Form | Frequency | Bengali Word | English Form | Frequency | Bengali Word | English Form | Frequency |
| ভাই | Brother | 350 | না | No | 371 | না | No | 488 |
| না | No | 246 | ভালো | Good | 291 | ভালো | Good | 305 |
| একটা | One | 189 | কথা | Talk | 196 | কথা | Talk | 265 |
| মনে | In Heart | 173 | একটা | One | 188 | মানুষ | Human | 259 |
| কথা | Talk | 151 | মানুষ | Human | 181 | একটা | One | 257 |
| ভালো | Good | 147 | আল্লাহ | Allah | 162 | ভাই | Brother | 215 |
| টা | The | 115 | মানুষের | Human’s | 159 | মনে | In heart | 212 |
| মানুষ | Human | 100 | সুন্দর | Beautiful | 155 | মানুষের | Humans | 182 |
| টাকা | Money | 93 | হবে | Will be | 153 | দেশের | Countries | 170 |
| শেষ | End | 93 | কাজ | Work | 151 | হবে | Will be | 146 |
| হবে | Will be | 89 | পারে | Can do | 146 | আল্লাহ | Allah | 145 |
| এক | One | 87 | মনে | In heart | 144 | বেশি | More | 127 |
| নাই | No, it’s not | 78 | হয় | is | 134 | টাকা | Money | 127 |
| কমেন্ট | Comment | 76 | বেশি | More | 132 | দেশে | In Country | 126 |
| সবাই | Everyone | 72 | এক | One | 125 | নাই | No, it’s not | 125 |
| শেখ | Sheikh | 70 | করে | By doing | 123 | এক | One | 124 |
| বেশি | More | 69 | খুব | Very | 118 | কাজ | Work | 123 |
| পোস্ট | Post | 69 | সময় | Time | 113 | করে | Is doing | 116 |
| একটু | A bit | 67 | ভাই | Brother | 111 | একজন | One person | 108 |
| আছে | There is | 64 | নতুন | New | 110 | খুব | very | 107 |
However, this analysis contains stop-words, the exclusion of which is necessary for text analysis, but in Bengali, especially in sarcasm, they have a significant value to capture the semantical and contextual values of a sentence. Despite this limitation, the high frequency of words represents the cultural and behavioural approach of Bengali speakers in social media, which shows the casual addressing among the Bengalis. This inclusion was necessary to improve the model’s accuracy by capturing the semantical structure for sequential models. Moreover, this inclusion also helps in linguistics-based studies to present the structural integrity of Bengali. However, this dataset does not contain dust and non-Bengali words, as the word cloud shows. Through the selective approaches of removing punctuations, emojis, non-Bengali words and breaking down the compound words to a normal form provides a consistent and prominent dataset is provided, which assures a better dataset to train models on in NLP works.
Fig. 3 shows the generated Bangla word cloud for each class, in which the dominance of “ভাই (brother)” is prominent in sarcastic comments. In neutral and non-sarcastic class, “না (No)” dominates the usage rate among native Bengali speakers. Fig. 4 contains the same word cloud translated to English corresponding to the sarcastic, neutral, and non-sarcastic classes. Each word cloud illustrates the frequency and prominence of words within its corresponding sarcasm category based on their occurrence in the dataset. This offers a clear and intuitive visualisation of language usage trends across various temporal contexts.
Fig. 3.
Bengali text word cloud for a) Sarcastic, b) Neutral, and c) Non-Sarcastic class in BanglaSarc3 dataset.
Fig. 4.
Corresponding word cloud for a) Sarcastic, b) Neutral, and c) Non-Sarcastic class in BanglaSarc3 dataset.
4. Experimental Design, Materials and Methods
To generate a large volume of text data, especially in Bengali, which presents considerable challenges, the BanglaSarc3 dataset was created following a structured approach. Initially, Bengali text was collected from publicly available posts on Facebook pages [4]. The gathered content was then compiled into a Google Spreadsheet for aggregation. Next, the dataset underwent thorough pre-processing, including initial cleaning, anonymisation, duplicate removal, and filtering of any profane language.
The data selection process was done carefully while maintaining the consistency of each class as closely as possible. Addressing potential biases on certain topics was ensured, as the same sentence can mean different things in different contexts. To mitigate this issue, each team member was instructed on some inclusion and exclusion criteria, which are presented in Table 4. One of the most thoughtful considerations was to maintain a selective process for including text blocks by selecting sentences based on recent topics and a minimum standard of content quality, avoiding overly overwhelming contexts. Another rule followed was to keep at least 200 data points within the same context and not exceed 250 from a single context. It is important to note that each context could include multiple posts across different pages, as Facebook page admins often reuse the same post across various pages or copy content directly. One of the most challenging aspects was managing the age diversity of different users; the varying opinions of each group were also considered to preserve the contextual preferences in the data, as users often respond to sarcasm with more sarcasm. However, sarcasm is not only found on pages dedicated to it; it can appear anywhere, as a user's composure may break down. Facebook, being an informal social media platform, contains a wide range of opinions. Therefore, choosing a specific type of page was not a primary concern. More details on the inclusion and exclusion criteria for data collection are provided in Table 3.
Table 4.
Fleiss’ Kappa Level of Agreement Reference Table.
| Fleiss’ Kappa Value | Level of Agreement |
|---|---|
| < 0.00 | Poor |
| 0.00 – 0.20 | Slight |
| 0.21 – 0.40 | Fair |
| 0.41 – 0.60 | Moderate |
| 0.61 – 0.80 | Substantial |
| 0.81 – 1.00 | Almost Perfect |
Table 3.
Inclusion and Exclusion Criteria for Collecting Text Data.
| Criteria | Description |
|---|---|
| Any kind of Facebook Pages (focusing more on the Sarcastic one for enriched data) | |
| Slight insult, Slight uses of slang, Friendly Bunters, etc. | |
| Within the same context, at least 200 data points from each context | |
| Inclusion | Texts with sufficient context to refer to sarcasm. |
| At least 5-word-length sentences. | |
| Containing class balance | |
| Non-Bengali sentences, Promotional posts. | |
| Text length exceeding 1000 words length *(Some paragraphs were needed to capture the context) | |
| Exclusion | Duplicate texts, Incomplete texts, and Texts that include political names. |
| Within the same context, not >200 data points from each context are used to overcome potential biases. | |
| Texts that are too ambiguous or subjective for annotators to agree on sarcasm presence | |
| Low-context statements and non-human texts (such as AI or any kind of tools generated) | |
In the next phase, the dataset was carefully evaluated by three native Bengali speakers. Each evaluator independently classified the sentences into three distinct text categories: Sarcastic, Neutral, and Non-Sarcastic. This rigorous assessment ensured precise categorisation, enhancing the dataset's reliability and overall quality. Fig. 5 illustrates the structured workflow of the dataset generation process, outlining each step from data collection to final labelling by native speakers.
Fig. 5.
Data collection and pre-processing methodology to create the BanglaSarc3 dataset.
The BanglaSarc3 dataset was created using a structured approach. After collection, the text underwent pre-processing to ensure clarity, coherence, and correctness, including spelling correction, duplicate removal, punctuation removal, and filtering out emojis and expressions. Next, three annotators independently labelled the data into three categories, which are Sarcastic, Neutral, and Non-Sarcastic, using Fleiss’ Kappa to measure agreement. Disagreements were resolved through majority voting, confidence scoring, and random checking. This systematic process ensured a high-quality, cleaned, and accurately labelled dataset, making BanglaSarc3 a reliable resource for Bengali sarcasm detection research. Each of them followed the well-renowned matrix of Fleiss’ Kappa Interpretation, which is represented in Table 4.
During the data annotation phase, the Inter-Annotator Agreement (IAA) was assessed using the Fleiss Kappa coefficient, as outlined in Algorithm 1 [4]. This coefficient is essential for measuring the consistency of annotators in classifying text into Sarcastic, neutral, and Non-Sarcastic texts. A high Fleiss Kappa score indicates strong agreement, reinforcing the reliability of the labelled data [5].
Algorithm 1.
BanglaSarc3 Annotation Process.
| Input: | Raw Bengali Comments, |
| Output: | Class of Bengali text, |
| Step-1: | Segment the sentences into manageable batches for annotation (i.e., 4000 sentences per speaker) |
| Step-2: | Select a group of 3 native Bengali speakers with strong language proficiency. |
| Step-3: | Assign each batch of sentences to multiple annotators (e.g., 3 annotators per sentence) to ensure redundancy. |
| Step-4: | Use ‘Google Spreadsheet’ where annotators can label each sentence as sarcastic, neutral, and non-sarcastic |
| Step-5: | Calculate agreement using statistical measures (i.e., Fleiss' Kappa) to calculate the inter-annotator agreement (IAA) |
| Step-6: | Calculate Fleiss’ Kappa: |
| Step-7: | |
| Step-8: | Checking conflict of resolution by reviewing disagreements and sorting them for further review, and annotating with an expert annotator |
| Step-9: | Measuring majority voting by counting the number of votes for each label and setting the final annotation with the highest vote |
| Step-10: | Calculate the confidence score for each sentence based on the proportion of annotators who agreed on the label. |
| Step-11: | |
| Step-12: | Checking randomly sampled annotated sentences and performing a quality check to ensure annotations meet the required standards |
| Step-13: | Compile the final annotated sentences into the BanglaSarc3 dataset. |
This study accepted the data only categorised as Almost Perfect; the reference of agreement on the Fleiss Kappa coefficient is given below in Table 5.
Table 5.
A portion of hard-to-categorise data (For both Humans and machines).
![]() |
While labelling data as hard-to-categorise data was an important feature to look at, due to the different understanding of sarcasm for each person. This can affect a machine's understanding as well. Eventually, the decision was made to include a substantial level of agreement on data as well due to the contextual integrity it provides. Some of them are listed below in Table 5. However, to deal with that problem in the machine, further tuning the models is necessary to get better results.
After completing data annotation, four deep learning models were evaluated for classifying Bengali Comments into Sarcastic, Neutral, and Non-Sarcastic categories. Model training was conducted with a batch size of 64 over 50 epochs. Five performance metrics were computed to determine the best-performing model. Among the models, RNN+BiLSTM achieved the highest average accuracy of 89.67 %, surpassing others, while LSTM recorded the lowest at 80.51 %. For class-wise performance, the RNN+BiLSTM model excelled in sarcastic text classification with 89.71 % accuracy. Additionally, RNN+BiLSTM performed exceptionally across all metrics, including precision (89.67 %), recall (89.65 %), F1-score (89.66 %), and an AUC of 0.99, indicating near-perfect performance. These results, along with detailed comparisons, are summarised in Table 6.
Table 6.
Performance of four applied neural network-based deep learning models on the BanglaSarc3 dataset.
| Model | Class | Accuracy ( %) | Precision ( %) | Recall ( %) | F1-Score ( %) | AUC ( %) |
|---|---|---|---|---|---|---|
| Sarcastic | 82.21 | 81.23 | 81.25 | 81.24 | 0.99 | |
| LSTM | Neutral | 80.62 | 80.65 | 80.61 | 80.64 | 0.99 |
| Non-Sarcastic | 78.62 | 78.63 | 78.62 | 78.63 | 0.98 | |
| Sarcastic | 82.35 | 82.29 | 82.32 | 82.31 | 0.99 | |
| BiLSTM | Neutral | 82.52 | 82.64 | 82.68 | 82.65 | 0.99 |
| Non-Sarcastic | 81.96 | 81.94 | 81.99 | 82.97 | 0.99 | |
| Sarcastic | 86.72 | 86.67 | 86.68 | 86.75 | 0.99 | |
| Conv1D-LSTM | Neutral | 86.61 | 86.43 | 86.47 | 86.45 | 0.99 |
| Non-Sarcastic | 86.63 | 86.65 | 86.66 | 86.65 | 0.99 | |
| Sarcastic | 89.71 | 89.79 | 89.72 | 89.71 | 1.00 | |
| RNN+BiLSTM | Neutral | 89.67 | 89.67 | 89.61 | 89.62 | 0.99 |
| Non-Sarcastic | 89.62 | 89.68 | 89.62 | 89.61 | 0.99 |
Limitations
The BanglaSarc3 dataset, while valuable for Bengali NLP, has certain limitations. The inclusion of stop-words may reduce the effectiveness of word frequency analysis. Manual labelling by three native speakers introduces potential bias and inconsistency. Additionally, the dataset’s sources, such as blogs, Facebook pages, magazines, and news articles, may not fully capture the diversity of Bengali language usage, impacting generalizability. Pre-processing steps might inadvertently remove relevant information along with noise. Moreover, categorising only Sarcastic, Neutral, and Non-Sarcastic classes may oversimplify the complexity of Bengali semantical and contextual approaches, overlooking nuanced grammatical structures. Despite these challenges, the BanglaSarc3 dataset remains a crucial resource for Bengali NLP research.
Ethics Statement
The BanglaSarc3 dataset emphasises ethical data collection practices. Publicly available content from blogs, Facebook pages, magazines, and news articles was responsibly sourced and incorporated. Strict criteria ensured that only copyright-free material was included, preventing any potential infringement. Additionally, the dataset adheres to principles of responsible use, ensuring no harm or violation of individual rights. For content obtained from Facebook pages, compliance with Facebook’s content usage policies ensured that no additional permissions were required [6].
Acknowledgements
We want to extend our heartfelt gratitude to the three annotators (Prof. Dr. A B Kareem and his teams) who played a crucial role in creating the BanglaSarc3 dataset. Their expertise, dedication, and meticulous attention to detail have been invaluable in ensuring the accuracy and reliability of the dataset. Their efforts in annotating the sentences and categorising them into sarcastic, neutral and non-sarcastic have significantly contributed to the quality and usefulness of this resource. We deeply appreciate their time, knowledge, and commitment to advancing natural language processing research in Bengali. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Declaration of Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data Availability
References
- 1.Misra R., Arora P. Sarcasm detection using news headlines dataset. AI Open. Feb. 2023 doi: 10.1016/j.aiopen.2023.01.001. [DOI] [Google Scholar]
- 2.Sen O., Fuad M., Islam Md.N., Rabbi J., Masud M., Hasan Md.K., Awal Md.A., Ahmed Fime A., Hasan Fuad Md.T., Sikder D., Raihan Iftee Md.A. Bangla Natural Language Processing: a comprehensive analysis of Classical. Machine Learning, and Deep Learning-Based Methods, IEEE Access. 2022;10:38999–39044. doi: 10.1109/ACCESS.2022.3165563. [DOI] [Google Scholar]
- 3.Das A.K., Al Asif A., Paul A., Hossain Md.N. Bangla hate speech detection on social media using attention-based recurrent neural network. Journal of Intelligent Systems. 2021;30:578–591. doi: 10.1515/jisys-2020-0060. [DOI] [Google Scholar]
- 4.Bijoy Md Hasan Imam, Ayman Umme, Islam Md Monarul. BanglaTense: a large-scale dataset of Bangla sentences categorised by tense: past, present, and future. Data Brief. 2025 doi: 10.1016/j.dib.2025.111400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Haque Md.Z., Zaman S., Saurav J.R., Haque S., Islam Md.S., Amin M.R. B-NER: a novel Bangla named entity recognition dataset with largest entities and its baseline evaluation. IEEE Access. 2023;11:45194–45205. doi: 10.1109/ACCESS.2023.3267746. [DOI] [Google Scholar]
- 6.Page public content access - graph API, (2025). https://developers.facebook.com/docs/features-reference/page-public-content-access/276 (accessed July 16, 2024).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






