Table 1.
Summary of existing studies comprising methods, datasets, and key findings.
| Reference Number | Objective | Method | Dataset | Measures |
|---|---|---|---|---|
| Islam et al.11 | To explore how ML and sensor technologies enhance emotion perception and activity recognition. | Multimodal Sensor Data, DL, Temporal–Spatial Behaviour Modelling, Cognitive and Affective Computing | AAL or Smart Home Sensor Dataset | Quality of Life, Emotion Recognition Accuracy, Activity Detection |
| Romero and Armenta12 | To develop a real-time model for detecting and classifying seven facial emotions in children. | Camera input, CNN on Raspberry Pi 3b+, Trained CNN model | FER‑2013 | Facial Emotion Identification Rate |
| Asha et al.13 | To develop a modular, voice-controlled AI assistant for seamless human–machine interaction. | NLP, Voice Commands | Pre-Trained Models | Real-Time Performance, Interaction Accuracy |
| Pavithra et al.14 | To develop a DL-based SER system for accurate and reliable emotion detection from speech. | Audio Feature Extraction, DL, RNN | Labelled Emotional Speech Samples | Accuracy, Reliability |
| Brilli et al.15 | To develop AIris, an AI-powered wearable device for visually impaired users. | Object Recognition, Scene Interpretation, NLP, Real-Time Auditory Feedback | Real-World Visual Data | Accuracy, Usability |
| Bertacchini et al.16 | To explore the use of a Pepper robot integrated with ChatGPT. | Pepper–ChatGPT Integration, Simulated Interaction Scenarios, Social Robotics Dialogue | Simulated ASD Interaction Scenarios | Feasibility, Acceptability, Effectiveness |
| Reddy et al.17 | To develop an assistive system to support individuals with paralyzed hands. | HGR, Real-time Voice Output, Sensor-based Input Processing | Custom Gesture Dataset | Accuracy, Responsiveness |
| Begum et al.18 | To develop an end-to-end system to aid communication for hearing-impaired individuals. | Quantized YOLOv4-Tiny Detection, Character to Text Generation, LSTM-based Text Model | BdSL 49 Dataset | mAP, Accuracy |
| Kandula et al.19 | To bridge communication gaps and memorize complex sign systems. | Webcam hand gesture capture, Gesture recording pipeline, Model training and testing | Custom Webcam Gesture Data | Accuracy |
| Di Luzio, Rosato, and Panella20 | To enhance emotion classification via video by optimizing facial landmark inputs. | Facial Landmarks Detection, Binary DNN, Improved Integrated Gradient | Facial Video Dataset | Higher Accuracy, Reduced Cost |
| Slade et al.21 | To enhance SER accuracy and robustness. | AST with CSO, Optimizable 1D‑CNN, BiLSTM, CNN‑BiLSTM with Attention, NTKM | EMO‑DB, SAVEE, TESS | Accuracy, mAP |
| Neeraja et al.22 | To develop an accurate and real-time system for detecting driver somnolence, thereby improving road safety. | CV, Physiological Signal Monitoring, ML | Driver Drowsiness Datasets | Accuracy, Real-Time Performance |
| Ali and Hughes23 | To develop a model for emotion recognition using self-supervised pretraining techniques. | UBVMT, Self-supervised pretraining, masked autoencoding, contrastive modelling, transformer architecture | CMU-MOSEI, Public Biosensor Datasets | Accuracy, Memory Efficiency |
| Paul et al.24 | To develop a real-time attendance system for enhanced accuracy and practical deployment. | ResNet-50 Face Recognition, ViT Emotion Detection, Dual-Path Architecture, Web Integration | Custom Real-Time Dataset | Accuracy, AUC-ROC Score |
| Choi, Zhang, and Watkins25 | To enhance audio classification for improved performance. | SSAST, Multi-Layer Feature Fusion, Patch-Wise Pooling, Self-Supervised Learning, Dual Representation | CREMA-D, TESS, RAVDESS, Speech Emotion Classification, Isolated Urban Events, CornellBirdCall | Accuracy Rates |
| Wang and Chai26 | To optimize personalized learning paths and learning efficiency. | LSTM Behaviour Capture, Transformer Self-Attention, DL Integration | Learner Behaviour Sequences | Knowledge Mastery, Learning Time, Satisfaction |
| Ramani et al.27 | To accurately detect human emotions without manual feature engineering. | Deep Bidirectional LSTM, Multimodal Sensor Fusion, Iterative DL | On-Body, Ambient, Geographical Sensors | Accuracy, Effectiveness |
| Prithi and Tamizharasi28 | To enhance CRM for accurate emotion analysis. | FFDMLC, Feature Fusion, DL, COOT | CK+, FER2013 | Recognition Rate, Accuracy |
| Selvaraju et al.29 | To develop a real-time ISL gesture-to-text subtitle system for individuals with hearing and speaking impairments. | CNN, YOLOv5, HMM, WebRTC | ISL Gesture Video Dataset | Accuracy, Latency, Usability |
| Ghadami, Taheri, and Meghdari30 | To develop a transformer-based DL system for improved communication and learning. | Early And Late Fusion Transformers, GS, Keypoint Feature Extraction, Multi-Task Learning | 101 Iranian Sign Language Words | Accuracy, Real-Time Feedback |
| Khanum et al.31 | To develop an IoT-based wearable device to enhance women’s safety, including offline functionality for evidence preservation. | IoT Wearable Device, Real-Time Audio Tracking, Location Tracking, Emergency Alert System | Not Specified/Real-Time User Data | Response Time, Alert Accuracy |
| Siju and Selvam32 | To develop a system for accurate and efficient SLR compatible with edge devices. | Google Mediapipe Landmarks, DNN, Tensorflow Training, Live Webcam Testing | Hand Gesture Images (Peace, Okay, Stop) | Accuracy, Latency |
| Naik et al.33 | To develop a robust multimodal real-time emotion recognition system. | Text: BERT + TF-IDF, Audio: CNN + Augmentation, Video: CNN + OpenCV | Four Kaggle Datasets (audio, video, text) | Audio and Video Accuracy of 99.44% and 97.66%, Audio and Video Validation of 94.71% and 65.38% |
| Liu et al.34 | To enhance sentiment analysis accuracy for short texts. | TF-IDF, CSO, SVM, AdaBoost Soft Voting | Six Real Polar Sentiment Analysis Datasets | Accuracy Improvement: >4.5% |
| Filahi et al.35 | To improve e-commerce decision-making using IoT data and ML models. | LR, NB, SVM, RF, AdaBoosting, GRU, LSTM | Customer Behaviour and Preference Data Collected Via IoT Devices | Accuracy of 88%, F1-Score of 0.927, Precision of 0.908, Recall-0 of 0.569 and Recall-1 of 0.947 |
| Sandulescu et al.36 | To develop an AI-driven healthcare platform. | IoMT Sensor Integration, AI Predictive Models, Emotion Detection Algorithm | Patient Sensor Data and Voice Recordings | Early Symptom Detection, Disease Progression Tracking |
| Muhammad et al.37 | To enhance emotion detection accuracy on imbalanced and limited data. | DeBERTa-v3-large + CNN, Electra + CNN, XLNet-base, RoBERTa + CNN, T5-base, Synonym Replacement Augmentation | ISEAR | Accuracy (best) of 94.94%, Accuracy (others) of 93%/69%, Improved Precision & Recall |
| Thiab, Alawneh, and Mohammad38 | To evaluate and enhance emotion classification in textual conversations. | RNN and Transformer-based Models, Ensemble via Majority Voting | SemEval 2019 Task 3 (EmotionX) | Transformer F1-Score of 75.55%, RNN F1-Score of 67.03%, Ensemble F1-Score of 77.07% |
| Kumar, Khan, and Choi39 | To develop an accurate mental health detection model from social media text. | RoBERTa + Adapter Layers, BiLSTM, AM | Filtered GoEmotions Dataset | Accuracy (binary): 92%, Accuracy (multiclass): 88% |
| Geethanjali and Valarmathi40 | To improve multimodal sentiment analysis during the COVID-19 pandemic. | CNN, LSTM, IChOA, Feature Fusion Strategy | GeoCoV19 Dataset | Accuracy of 97.8% |
| Arbaizar et al.41 | To enable real-time, objective monitoring and prediction of psychiatric patients’ emotional states. | HMM, Transformer DNN, Time-Series Forecasting, Classification Algorithms | Passive and Self-Reported Data from the Evidence-Based Behaviour (eB2) App | Emotional state accuracy: 0.93, ROC AUC (valence): 0.98, ROC AUC (1-day prediction): 0.87, Accuracy (suicidal ideation): 0.9, ROC AUC (suicidal ideation): 0.77 |
| Kohneh Shahri, Afshar Kazemi, and Pourebrahimi42 | To develop and evaluate a comprehensive sentiment analysis model for improved accuracy and speed. | Image, Sound and text processing, CV, NLP | Social Network Multimedia Data | Accuracy (High), Speed (Fast) |