Abstract
Mental stress is a prevalent issue in modern society, and detecting and classifying it accurately is crucial for effective interventions and treatment plans. This study aims to compare various machine learning (ML) algorithms for detecting mental stress using wearable physiological signal data and proposes a novel model that is automatic, high-performing, low-cost, and with lower time and computation complexity. The proposed model was trained and tested on a dataset of 200 participants, which involves applying four different stressors. Nine ML algorithms were investigated for both multivariate and univariate features. The physiological data was collected using a novel device developed using an Arduino microcontroller and low-cost sensors such as ECG, GSR, and ST sensors. The findings reveal that the suggested model detects mental stress with an accuracy of 96.17%, with the XGBoost method outperforming other algorithms in multivariate analysis. Univariate feature analysis found that XGBoost regularly demonstrated good accuracy, showing its dependability for detecting mental stress. The novel device created using low-cost sensors and automatic, high-performing algorithms is an effective and accessible tool for mental stress detection. Additionally, benchmark dataset validation (SWELL-KW, WESAD) confirmed the model’s robustness with accuracies of 92.38% and 94.21% respectively. A real-time pilot test on ten new participants utilizing the developed device validated the model’s practical value, with 97.5% classification accuracy and low latency. This study provides insights into the most effective ML algorithms for mental stress detection and creates a comprehensive and reliable resource for future research.
Keywords: ECG, GSR, Machine learning, Mental stress, Physiological signal, ST
Subject terms: Health care, Engineering
Introduction
The normal feeling of difficulty in dealing with specific tasks and conditions is stress. Stress is your body’s reaction to a challenge or demand. In brief moments, stress can be useful, such as when it helps you avoid danger or achieve a deadline. When stress lasts for a prolonged period, it can develop into a chronic condition if no steps are made to manage it, and it can have an impact on your health1. Stress, by creating mind–body modifications, directly leads to psychological and physiological disorders and disease, as well as affects mental and physical health and degrades the quality of life. Everyone’s reaction to stress differs; the same event can be distressing for one person but not for another. Because for some people, just thinking about any challenging work might trigger a stressful situation, there is no obvious explanation for why one person feels less stressed than another when confronted with an identical stressor. There are several ways to measure a person’s mental stress. One of the most common methods is through self-report measures, where individuals report their level of stress using standardized questionnaires. However, self-report measures can be subjective and may not provide an accurate representation of the actual level of stress2. Another way to measure mental stress is through physiological measures. This involves monitoring the body’s response to stress through various physiological signals such as heart rate, electroencephalogram (EEG), electrodermal activity (EDA), and event-related potentials (ERP). These signals are typically measured using specialized devices such as heart rate monitors, EEG machines, and skin conductance sensors.
In recent years, ML and artificial intelligence techniques have also been applied to detect and measure mental stress. These techniques involve using algorithms to analyze physiological signals and patterns in behavior to identify signs of stress. Overall, measuring mental stress can be a complex and multidimensional process, and a combination of different methods may provide the most accurate assessment of an individual’s stress level. Electrocardiogram (ECG), Galvanic Skin Response (GSR), and skin temperature (ST) are all physiological measures that are sensitive to changes in autonomic nervous system (ANS) activity and are frequently used in mental stress detection studies3. In reaction to internal and external stimuli, the ANS regulates the body’s internal processes such as heart rate, blood pressure, and sweating. When an individual is under stress, their ANS responds by activating the sympathetic nervous system (SNS), which prepares the body for a reaction of flight or fight. This activation causes a number of physiological changes, including a rise in heart rate, vasoconstriction, and sweating4.
The ECG is used to evaluate the electrical activity of the heart and can detect variations in heart rate and heart rate variability (HRV) that occur during stress. GSR detects changes in the electrical conductivity of the skin, which can be affected by changes in sweat gland activity. Changes in blood flow and sweat gland activity also affect ST, which can be used to recognize variations in SNS activity. Researchers can gain insights into the physiological changes that occur during stress by measuring changes in ECG, GSR, and ST, and develop methods to identify and manage stress in real-life situations.
The following are the study’s unique contributions:
This study proposes a novel ML framework for detecting mental stress using wearable physiological signal data and various Bagging and boosting algorithms, providing insights into the most effective algorithm for mental stress detection.
The experimental results demonstrate the performance of the RF and XGBoost algorithms to be the most effective of the nine ML algorithms tested, with 96.03% and 96.17% accuracy, respectively.
Individual physiological signals (ECG, GSR, and ST) are examined using univariate analysis to find unique patterns and features related to mental stress. The multivariate analysis takes into account all signals at the same time, exploiting their combined impact and interactions to improve stress detection accuracy.
To train and test the suggested model, this study developed a new and larger dataset of 200 participants. The dataset was created by using four distinct effective stressors, making it a comprehensive and dependable resource for mental stress detection research.
Related work
The use of wearable sensors for mental stress detection has been an active research area for some years. Numerous studies have investigated the usage of different physiological signals and ML algorithms for detecting mental stress using wearable sensors. Wearable sensors have become increasingly popular for monitoring physiological data for detecting mental stress. Among the different types of physiological data, ECG, GSR, and ST are some of the most commonly used signals due to their importance in assessing cardiovascular health, emotional arousal, and thermal regulation, respectively. Here, we present a detailed related work section, discussing the most related and current studies in this field.
Literature review
Zubair Muhammad and Yoon Changwoo5 selected 14 volunteers and collected data with a commercial Pulse sensor. The approach extracts characteristics from PPG signals and offers a new set of features to quantify temporal information. Using SVM, the proposed method classified five different levels of mental stress with an accuracy of 94.33%. Also, the system was tested on a different stressor dataset, demonstrating its capacity to detect diverse mental stress states employing ultra-short-term recordings from a cost-effective PPG sensor. This research demonstrates the suggested system’s capacity to detect and quantify mental stress levels. The study6 proposes a multi-sensor approach for detecting stress using physiological and sociometric sensors. The physiological sensors include ECG, respiration, and skin conductance, while the sociometric sensor is a wearable microphone. The authors collected data from 25 participants who were subjected to two stress-inducing tasks. They used several ML algorithms, including SVM, KNN, and decision tree, to classify stress. The results showed that combining physiological and sociometric sensors led to better stress classification compared to using physiological sensors alone. The best accuracy achieved was 87% using SVM. The study’s limitations included the small sample size, the limited number of stressors used, and the use of only healthy participants. The study7 involved 15 participants, and the sensors used were EDA sensors and ECG sensors for data collection and stress detection. The participants performed a stress-inducing task of public speaking, and their physiological data were collected during the task. The data was then used to extract features, which were fed into ML algorithms to classify stress levels. The system achieved an accuracy of 75.9% for stress level classification. The study also highlighted the potential of incorporating sociometric data, such as facial expressions and voice patterns, to improve stress detection accuracy. The study’s disadvantages include a small number of participants and the need for additional research to determine the system’s efficacy in real-world scenarios.
The study8 presents a novel method for detecting stress by combining deep neural networks (DNNs) with ECG and EDA. The authors collected physiological data from 50 participants who performed different stress-inducing tasks, including public speaking and mental arithmetic. They then preprocessed the data and trained three DNN models with different architectures using various combinations of ECG and EDA signals. The authors achieved an overall accuracy of 87.3% for stress detection with the best-performing model using both ECG and EDA signals. The study suggests that DNNs can be a promising approach to accurately detect stress using physiological signals, and the proposed method can be used in real-world settings for stress monitoring and management. The study9 proposes a hybrid model for detecting stress using real-time data analytics and the Internet of Things. The study focuses on the use of GSR and ECG sensors to monitor the physiological parameters of 34 participants while they undertake five different tasks designed to induce stress. The study computes the accuracy of different ML models, including Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Bagging Classifiers, Random Forest, Gradient Boosting, and Artificial Neural Network, to classify the mental state of the participants into four categories: relaxed, stressed, partially stressed, and happy. The hybrid model used a synthetic minority oversampling technique to deal with the imbalance class problem and achieved a high accuracy rate of 99.4% on the self-generated dataset. The study emphasizes the possibility of applying real-time data analytics to improve the quality of healthcare services, such as stress detection and diagnosis. Wearable sensors and machine learning algorithms have shown promising results for detecting mental stress. However, limitations such as small sample sizes and limited stressors used in studies need to be addressed.
This study10 focuses on the feasibility of tracking both physical and mental stress in construction workers using physiological signals and machine learning. The study emphasizes the necessity for a thorough stress evaluation, emphasizing the interrelationship of physical and mental stress. Data were acquired from 8 volunteers who wore a multi-sensor vest that assessed their heart rate, skin temperature, breathing rate, and skin conductance while performing physical and mental tasks. ML algorithms accurately classified stress levels, achieving up to 94.7% for simultaneous monitoring with bagged trees. The results showed that integrating physiological signals and applying person-specific normalization considerably enhanced prediction ability. Lee et al.11 proposed a model using ultra-short-term HRV analysis with EMD-derived features to detect mental stress. The 26 features were evaluated using data from 74 police officers who were exposed to acute stressors such as the Trier Social Stress Test (TSST) and horror movie screenings. Using an SVM classifier, the approach obtained an accuracy of up to 90.5%. According to this study, 2–3 min of HRV data are adequate for reliable stress detection, indicating the possibility for real-time applications in wearable devices. The study 12 presents Shuffled ECA-Net, a deep learning model for stress detection that uses multimodal wearable sensors to measure ECGs, respiratory waveforms, and electrogastrograms (EGGs) with their own developed device. Stress levels were identified using salivary cortisol to ensure objectivity. The model uses a unique Shuffled Efficient Channel Attention module to improve sensor fusion by considering inter-modality interactions. When evaluated with five-fold cross-validation, it outperformed baseline models (accuracy: 0.916, AUROC: 0.964). While multimodal fusion enhanced stress detection, the model’s generalizability was limited by a small 26-participant dataset and low inter-subject performance. The work focuses on practical, non-intrusive stress monitoring, with the potential for real-world applications and future optimization. This study13 develops an automatic technique for detecting mental stress in researchers using single-lead ECG readings acquired via a wearable smart T-shirt. Data from 20 male researchers was collected, with 1,800 min of ECG data divided into 1-min intervals. Decision Tree (DT) models outperformed the others, obtaining 93.30% accuracy for intra-subject classification and 94.10% for inter-subject classification. 3 HRV features were extracted. Flexible dry electrodes enabled pleasant and quick data collecting. The method has exceeded previous approaches, demonstrating its promise for real-time, non-invasive stress monitoring in high-risk settings. The study14 introduces a novel method for detecting driving stress that employs nonlinear representations of short-term physiological variables (30 s or less) and multimodal convolutional neural networks (CNNs). The method transforms GSR (hand and foot) and HR data into continuous recurrence plots (Cont-RPs), which are then analyzed by CNNs for stress-related characteristics. These features are integrated to create a representation vector for categorization. When tested on a real-world driving dataset of nine people, the methodology obtained 95.67% accuracy for 30-s signals and 92.33% for 10-s signals, exceeding earlier methods and proving efficacy even with short-term data. This emphasizes the possibility of real-time stress detection in driving scenarios.
Despite these promising results, more study is needed to enhance the detection mechanism indicated in these studies. Some studies, for example, have highlighted the need for more diverse and larger datasets to improve algorithm robustness and generalizability3,15. This study proposes a new approach using physiological data collected through wearable sensors. It will involve a larger and more diverse participant population and use a variety of stressors to improve generalizability. The approach has the potential to improve stress management and monitoring in real-world settings.
Research gap
While the previous studies made significant advances in the area of detection of mental stress using wearable sensors and ML techniques, there are still some gaps that are worth highlighting to further advance this research field:
In most of the studies, the dataset size is less than 100 participants which restricts the robustness and generalizability of their findings due to small datasets. This paper addresses this research gap by using a dataset of 200 participants, which is larger than those in many existing works, and also collected four diverse stress-inducing scenarios to improve the performance of the model.
Prior studies often utilize a limited number of stressors which do not adequately represent the variety of real-world stress conditions which makes it difficult to generalize models to real-life applications. This study includes four different stressors for data variability and generating more data points.
While some previous work utilizes physiological signals like ECG or EDA, only a few studies examine the combined impact of multiple signals such as ECG, GSR, and ST. In this study, this multimodel approach has been applied to provide richer information and improve stress detection accuracy. Also, the univariate and multivariate analysis is made to focus on the importance of using multimodels for better accuracy.
Many studies rely on high-cost wearable devices for data collection which may limit their practicality and scalability in real-world settings, especially in low-resourse environments. This study proposes a low-cost and computationally efficient system integrating wearable sensors and ensemble ML techniques for the enhancement of accessibility and practicality.
Experimental protocols
This section outlines the experimental procedures used to gather data on mental stress using an IoT device that included ECG, GSR, and ST sensors. Real-time monitoring of the physiological reactions associated with stress was made possible by the development of this system and the selection of certain sensors.
IoT device and sensors (setup & placement)
A device has been developed that utilizes three sensors, namely an ECG sensor (Heart Rate monitor AD 3282), a GSR module (Grove GSR_sensor), and a ST sensor (DS18B20 temperature sensor). The device is depicted in Figs. 1(a) and 1(b). Data collection was performed utilizing this device while adhering to the study protocol outlined in Fig. 2 and a participant wearing all the sensors is shown in Fig. 1(c). In addition to the aforementioned sensors, the device comprises other components such as a USB power supply (to provide power to the device), a Real-Time Clock (RTC) module (for timekeeping), a Micro SD card module (for data storage), an Arduino Mega and UNO (for sensor connections), and a TFT LCD display (for real-time display of ECG, GSR, and ST values). The ECG sensor is three electrodes or pads that are placed on the chest in specific locations to detect the electrical signals from the heart. The GSR sensor was placed on the fingers of the hand. When applying the GSR sensor to the fingers, the position must be on the palmar surface of the distal joint of the index and middle fingers. This is the area of the finger that has the highest density of sweat glands, which makes it the most sensitive location for measuring the changes in skin conductance. The DS18B20 temperature sensor was put under the armpit of the participants. This is a suitable location because it is easy to access, and it provides a relatively stable and consistent temperature measurement. To place the sensor under the armpit, it is essential to ensure that the sensor is in contact with the skin and that the sensor tip is positioned in the center of the armpit.
Fig. 1.
(a) IoT device with ECG sensor, GSR module, and ST sensor. (b) Prototype and components of the IoT device, (c) a participant wearing IoT device during stressor test [A—ECG sensor, B—GSR sensor, C—ST sensor, D—IoT device for collecting data].
Fig. 2.
Study protocol used here.
Participants and study protocol
The purpose of this study is to evaluate stress responses in a controlled environment, using a diverse group of participants to ensure that the findings are generalizable. The study approach was designed to accurately capture physiological and psychological stress markers by recruiting healthy volunteers and conducting standardized stress-inducing tasks. Details on the participants and the study protocol are provided below.
Participants details
For this experiment, 200 undergraduate students (128 males and 72 females) were recruited. 120 subjects were from Rajendra Institute of Medical Sciences (RIMS), Ranchi, India, and 80 subjects were from Birla Institute of Technology (BIT), Mesra, Ranchi, India. The participants were all in excellent health and varied in age from 20 to 26 years. None of the participants had a chronic condition such as a cardiovascular or mental disorder. Following an explanation of the experiment’s goal and protocol, participants were screened using a self-report questionnaire. Personal information such as name, age, gender, dominant hand, height, weight, blood group, education, hometown place, annual family income, and the number of siblings (excluding them) was acquired and considered prior to data collection utilizing a self-report questionnaire. For twelve hours before the data-collecting procedure, all participants were asked not to consume any medicine, alcohol, coffee, or tea. The participants were instructed not to speak and limited their movement during the exam and were seated in front of two computers in a comfortable room, one for completing stressor activities and the other for monitoring the timer for each stressor. They also completed two standard questionnaires, the State-Trait Anxiety Inventory (STAI)16 and the Perceived Stress Scale with 10 questions (PSS-10)17.
The participants’ socioeconomic backgrounds are diverse, as shown by a self-reported questionnaire that asked about family income, education, hometown, and other socioeconomic factors. The gender distribution is 128 males (64%) and 72 females (36%), providing a reasonable gender balance. The age range of 20 to 26 years is typical of young people, who are an important group in stress research, particularly for academic pressures. However, while this sample provides useful insights into stress detection in young adults, the findings may not be directly applicable to other age groups, such as older adults or children, or to populations with distinct health profiles, such as those with pre-existing diseases. The questionnaires were employed for additional demographic and psychological profiling. Although the dataset prioritizes a specific age group (as it specifically focuses on Indian Students) and demographic, it was created to replicate real-life mental stress scenarios using four different stress-inducing exercises. This controlled environment ensures the accuracy of the acquired data while also providing a solid platform for stress classification research.
Study protocol details
The four stressors are used in this study to induce stress in participants which is illustrated in Fig. 2. They are a combination of psychological, physical, and environmental stressors intended to provide a comprehensive evaluation of the participant’s response to stress. Each stressor’s 2.5-min length guaranteed consistency between trials, allowing for precise observation of physiological and psychological stress markers. The ground truth of this study is described in section "Ground truth". To minimize environmental noise, data collection was conducted in a calm laboratory setting. Participants were comfortably situated in front of a screen, with little movement to ensure optimal signal acquisition. Each session which included one participant’s physiological signal data at a time lasted about 10 min, and data was acquired for four stress-inducing tasks:
Watch funny videos- It is a non-stressful hobby that displays physiological signals during relaxation. It is defined as both psychological and environmental stressors.
Arithmetic test- It puts participants under cognitive stress by challenging them to answer mathematical problems under time limitations. It is a psychological stressor that increases cognitive load and causes stress due to the pressure of performance and time constraints.
Listen to Favorite Music- This activity was meant to provide psychological and environmental relief by involving participants in a calming and enjoyable activity that evaluated recovery and relaxation benefits. Music affects emotional states and may prevent stress-induced changes.
Strrop Color Word Test – This test challenged participants to identify the ink color of words spelling out different colors, requiring both automatic and careful reading. It is a psychological stressor that assesses attention, cognitive control, and intrusion management skills.
Ethical approval statement
The experimental protocols used in this study were approved by the Department of CSE, Birla Institute of Technology (BIT), Mesra, Ranchi, India (Approval No: CSE/HoD/Certificate/2023–24/164). All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Informed consent statement
Informed consent was obtained from all individual participants included in the study for the publication of identifying information/images in an online open-access publication. For the participant image provided in Fig. 1(c), specific informed consent was obtained from the participant for the use of his image in this publication.
Device validation methodology
The reliability of the developed device was validated by comparing its measurements for ECG and GSR signals with the benchmark datasets specifically SWELL-KW18 and WESAD19, which were collected using commercial-grade sensors. The SWELL-KW dataset does not include ST data so only the WESAD dataset’s ST sensor data was used along with the stress lysis dataset20,21 for validating the ST data. In Table 1, the comparison results are summarised with strong alignment between the developed device’s measurements and the benchmark datasets, validating its reliability for physiological data collection.
Table 1.
Comparision details about the device for validation.
| Signal | Parameter | Developed device (Mean ± SD) |
SWELL-KW (Mean ± SD) |
WESAD (Mean ± SD) |
Deviation | Commercial device names | |
|---|---|---|---|---|---|---|---|
| SWELL-KW | WESAD | ||||||
| ECG | IBI (ms) | 935–1024 | 910–1050 | 900–1100 | < 5% | MOBI device (TMSI) | RespiBan professional |
| GSR | Skin Conductance (µS) | 3.42 ± 0.15 | ~ 3.40 ± 0.14 | ~ 3.50 ± 0.18 | < 3% | MOBI device (TMSI) | RespiBan professional |
| ST | Skin Temperature (°C) | 34.1 ± 0.3 | N.A | 33°-36° | < 2% | – | RespiBan professional |
SWELL-KW’s MOBI device (TMSI) is a research-grade tool for ECG and GSR measurements, with high physiological data acquisition accuracy. The RespiBAN Professional, used at WESAD, is a chest-worn multisensory device capable of recording ECG, GSR, and temperature data with high precision. The ST sensor readings (mean ± SD: 34.1 ± 0.3 °C) matched the normative ranges (33–37 °C) reported in the Stress Lysis dataset. ECG IBI readings (< 5% variance), GSR readings (< 3% deviation), and ST readings (< 2% deviation) are well aligned with commercial-grade sensors from SWELL-KW and WESAD datasets, indicating the device’s reliability in providing physiological data.
Benchmark datasets
To validate the reliability of the developed device and assess the generalization of the proposed framework, two publicly available benchmark datasets namely SWELL-KW and WESAD were utilized. These datasets contain physiological signals recorded using commercial devices under controlled stress-inducing conditions which can be used as standard references for stress classification models. These datasets assist in comparing the accuracy and dependability of our custom-developed wearable device to commercial-grade sensors. The comparison results are shown in Table 1.
SWELL-KW dataset
This dataset consists of physiological and behavioral data from 25 participants performing knowledge-based work tasks under the following three conditions.
Neutral- It was a baseline condition that included regular working conditions with no stressors.
Time pressure- It was a stress condition in which the task difficulty and deadlines were increased to induce cognitive stress.
Interruptions- It was also a stress condition where frequent external disturbances were created during regular tasks to stimulate workplace distractions.
This dataset consists of ECG and GSR signals collected using the MOBI device (TSMI) which is a high-precision research-grade wearable sensor. This dataset does not include ST measurements.
WESAD dataset
This dataset includes physiological recordings from 15 participants undergoing three experimental conditions.
Baseline- Data was recorded when the participant was in normal resting condition.
Stress- Data was recorded using TSST when the participant was exposed to public speaking and mental arithmetic tasks.
Meditation- Data was recorded when the participant was doing guided breathing exercises for relaxation.
The dataset consists of ECG, GSR, and ST signals collected using the RespiBAN Professional device which is a commercial wearable device.
Methodology
The dataset used in this study was prepared by inducing stressors to the participants for detecting stress and includes data from 128 males and 72 females. The procedure of methodology is illustrated in Fig. 3.
Fig. 3.

Flowchart of mental stress detection model used in this study.
Preprocessing of dataset
Eleven HRV features were extracted from ECG data and then filtered using a notch filter with a cutoff frequency of 0.05. Seven GSR features were obtained from the GSR signal and then filtered using a low-pass Butterworth filter before computing the first derivative of the filtered signal. Additionally, the Discrete Wavelet Transform (DWT) was applied to degrade the signal into its frequency components, called approximation and detail coefficients due to the non-stationary nature of the GSR signal. Thirteen ST features were extracted and then the Butterworth filter was used with a lower frequency cutoff of 5 Hz and a sampling rate of 1000 Hz.
Feature selection and computation
Feature selection and computation were done using multivariate and univariate analysis. The list of extracted features from all the sensors used in this study is described in Table 2.
Table 2.
List of extracted features from all the sensors.
| Physiological signal source | Types of features | Features | Feature description |
|---|---|---|---|
| ECG | Time-domain | IIBI | Inter beat interval |
| SDNN | Standard deviation of intervals between adjacent beats | ||
| SDSD | Standard deviation of successive differences between adjacent R-R intervals | ||
| RMSSD | Root mean square of successive differences between adjacent R-R intervals | ||
| pNN20 | Proportion of differences between R-R intervals greater than 20 ms | ||
| pNN50 | Proportion of differences between R-R intervals greater than 50 ms | ||
| Statistical | MAD | Median absolute deviation | |
| Non-linear | SD | Poincare analysis, SD | |
| SD1 | Poincare analysis, SD1 | ||
| SD2 | Poincare analysis, SD2 | ||
| SD1/SD2 | Poincare analysis, SD1/SD2 | ||
| GSR | Frequency-Domain | FT | Fourier Transform |
| DWT | Discrete wavelet transform | ||
| Statistical | Mean | Mean | |
| Variance | Variance | ||
| Kurtosis | Kurtosis | ||
| Skewness | Skewness | ||
| Energy-related | EM | Energy measure | |
| ST | Statistical | Mean | Mean |
| Variance | Variance | ||
| SD | Standard deviation | ||
| RMS | Root mean square | ||
| Median | Median | ||
| MAD | Mean absolute deviation | ||
| Range-based | Min–Max | Min–max | |
| Range | Range | ||
| Derived | MSAD | Mean square of the approximate 1st and 2nd derivatives | |
| AA | Absolute Area | ||
| WL | Waveform length | ||
| ZCR | Zero-crossing rate, number of crossings | ||
| AC | Average value of absolute derivative |
Feature selection for multivariate analysis
For multivariate analysis, features from all three physiological signals (ECG, GSR, and ST) were combined in order to take advantage of their beneficial nature. By combining features from several signals, the multivariate technique provides a comprehensive perspective of stress-induced physiological changes, allowing for robust and reliable stress detection.
Feature extraction- It was applied to ECG, GSR, and ST data and 31 features were extracted based on their physiological relevance and sensitivity to stress response.
Pre-processing- FastICA algorithm22 has been applied to the input data which attempted to find the independent components that were most informative for the given problem. Here, the extracted components were then used as features for training and testing the ML model where 75% data was used for training and the other 25% for testing. Then Standard Scaler is used to scale the features of the training and test data for the modalities (ECG, GSR, and ST) to have a mean of 0 and a standard deviation of 1. This is done before training a ML model on the data, to expand the performance of the model.
Feature selection for univariate analysis
Features were chosen differently for each physiological signal based on their capacity to capture important physiological responses to stress within each signal type. Here, each feature was evaluated for its sensitivity to stress-related changes within its corresponding physiological signal. The selected features were utilized to train ML models separately for each signal type, allowing stress classification based on univariate data.
ECG features- Time-domain features were selected to indicate variations in heart rate and ANS activity while statistical features increased robustness by capturing signal variability23.
GSR features- Frequency-domain features can quantify variations in skin conductance frequency24. Also, Statistical metrics and energy-related features provided essential new dimensions for retrieving electrodermal activity variations during stress.
ST features- Statistical features, range-based metrics, and derived features captured thermal regulation dynamics influenced by stress.
Analysis methods
K-nearest neighbors (KNN)
Using a similarity metric such as Euclidean distance, KNN25 determines the K nearest data points to a specific test data point. The system then predicts the class of the test data point based on the overwhelming class of its K nearest neighbors. This approach does not make any assumptions about the underlying data distribution and can handle non-linear correlations between input and output variables. The disadvantage of KNN is that it can be computationally expensive for large datasets because the method must compute the distance between each test and training data point. KNN is also sensitive to the value of K and the distance metric utilized, both of which can have an effect on prediction accuracy.
Support vector machine (SVM)
SVM26 is a ML algorithm that finds the hyperplane with the greatest margin between positive and negative input points. The hyperplane is defined by a weight vector w and a bias term b, and the distance between data point x and hyperplane is measured as:
![]() |
SVM solves an optimization problem to find the hyperplane that maximizes the margin while minimizing the classification error. The optimization problem can be stated in two ways:
![]() |
where,
ai are the Lagrange multipliers
yi and yj are the class labels for data points xi and xj, respectively
xi ⋅ xj is the dot product of the feature vectors xi and xj
C is a hyperparameter
Decision tree (DT)
In classification, the DT algorithm divides data into two subsets and uses impurity criteria to determine the best feature for each split. Common factors include Gini impurity and cross-entropy. The algorithm constructs a tree, with internal nodes serving as feature tests and leaves serving as class identifiers. It selects the best feature to divide the data and then repeats the process for each subset before building the tree. Based on the input feature values, the DT predicts the class label by spanning the tree from root to leaf27.
Random forest (RF)
RF is an ensemble learning algorithm that combines multiple decision trees to enhance classification accuracy and robustness. To prevent overfitting and increase variety, each tree in the forest is trained on a randomly selected portion of its training data and a random subset of the input features. The final prediction is obtained by aggregating all of the trees’ predictions, either by majority vote or by averaging the probabilities. RF can also determine the significance of each input feature based on the reduction in impurity obtained by using that feature in the trees. This feature significance can be used for feature selection or model interpretation28.
Bagging SVM
Bagging SVM is ML ensemble algorithm that combines the power of SVMs with the bagging concept. Bagging is a technique that involves training numerous models on different subsets of data and combining their results to make a final prediction. In Bagging SVM, multiple models based on SVM have been trained on various sets of data, and their outputs are merged via averaging or voting to get the final prediction. This method enhances the model’s robustness and generalizability while decreasing prediction variance. By combining the predictions of numerous SVM models, Bagging SVM improves the performance of a single SVM model29. In this study, an SVM classifier is built with an RBF kernel and regularization parameter C = 1. Then created a bagging classifier with 10 base classifiers, each using 70% of the training data, and used the SVM classifier as the base estimator.
Bagging DT
The Bagging DT algorithm30 creates several DTs from different training data samples and feature subsets. These decision trees are then combined to create a powerful learner that determines the final classification. By generating a diverse set of decision trees, the bagging procedure serves to reduce overfitting. The Bagging DT algorithm uses majority voting or weighted voting to combine the outcomes of the individual decision trees. The weighted voting algorithm is shown below:
![]() |
where f(x) is the predicted class label for input x, ti(x) is the ith decision tree’s forecasted class label, also wi is a weight given to the ith decision tree based on its presentation on the validation set.
AdaBoost
AdaBoost is a boosting algorithm for classification that combines multiple weak classifiers to produce a strong classifier. The algorithm trains weak classifiers iteratively on various subsets of the training data, with higher weights assigned to misclassified samples from earlier iterations. The final classifier is a weighted sum of the weak classifiers, with the weights decided by the classification accuracy of the weak classifiers. The AdaBoost method can be formulated as an optimization problem of minimizing the exponential loss function, with the weights updated using the AdaBoost update equation at each iteration31.
Gradient boosting (GB)
Gradient boosting is a ML algorithm that combines multiple weak classifiers to produce a strong classifier. The algorithm repeatedly trains weak classifiers on various subsets of the training data, updating the weights given to misclassified samples based on the gradient of a loss function at each iteration. The final classifier is a weighted sum of the weak classifiers, with the weights decided by the contribution of the weak classifiers to the overall loss function32.
Extreme gradient boosting (XGBoost)
XGBoost is an ensemble ML algorithm that improves the efficacy of binary classification models by using gradient boosting. Iteratively, XGBoost builds a sequence of decision trees, each one trained to rectify the errors of the previous tree. The gradient descent method is used by XGBoost to minimize a loss function that is the sum of the loss function evaluated for each instance and the regularization term. The type of binary classification issue being solved, such as binary cross-entropy or logistic loss, can influence the loss function33.
Ground truth
The ground truth for this study is the binary organization of the data as either stress or no stress. During the experiment, participants were subjected to four different stressors, and their physiological responses were recorded using wearable sensors, including ECG, GSR, and ST sensors. The data acquired during stressful conditions were labeled as stress data, while data acquired during non-stressful conditions were categorized as no-stress data. This binary labeling allowed for the differentiation and analysis of physiological responses during periods of stress versus relaxation. The ground truth for this study was used to train and test the proposed model for detecting mental stress using wearable physiological signal data.
Model evaluation and performance measures
For each ML algorithm, models were trained using 75% of all subject data, with the remained data used for testing. Each method was also subjected to tenfold cross-validation. It is popular way because it provides more consistent approximations of model performance, reduces the risk of overfitting, and makes better use of available data.
Precision, recall, F1-score, AUC value, and accuracy were considered performance measures. Precision, recall, and accuracy are all essential measures when evaluating a mental stress detection system’s performance. High precision assures that only true stressful events are recognized, avoiding false positives and unnecessary interventions. High recall assures that all real stressful events are identified, avoiding false negatives that can result in missed intervention opportunities. High accuracy assures that the system is dependable and trustworthy, delivering precise outcomes that can be relied on for timely and effective interventions. Mental stress detection systems with high precision, recall, and accuracy can have significant beneficial impacts on individuals’ mental health and well-being. The AUC value provides an overall measure of the system’s ability to discriminate among stressful and non-stressful events, by computing the area under the receiver operating characteristic (ROC) curve. A high AUC value indicates a better ability of the system to differentiate between these events, which can lead to more accurate interventions and better mental health outcomes. Similarly, the F1-score is another critical performance measure that combines precision and recall measures. The F1-score balances precision and recall, guaranteeing that the system detects both authentic positives and true negatives while minimizing false positives and false negatives. A high F1-score indicates that the system can identify and differentiate between stressful and non-stressful events accurately, providing more effective and timely interventions, and resulting in better mental health outcomes for individuals.
By taking into consideration all of these performance indicators, one may ensure that the system can effectively identify and differentiate between stressful and non-stressful occurrences, allowing for timely and effective interventions that can have a major impact on a person’s psychological well-being and mental health.
Results and discussion
In this study, a novel machine embedded with an ECG, GSR, and ST sensor was used for data collection to classify mental stress in stress and no-stress conditions. The collected data was used to train and test nine different ML algorithms for the classification of mental stress.
Multivariate analysis
In this analysis, all the features are considered for training the classifiers. The results of the study are summarized in Table 3 and a graph for all classifier performance is shown in Fig. 4.
Table 3.
Results of different ML algorithms for mental stress classification using all features (multivariate analysis).
| Classifier | Precision | Recall | F1-Score | AUC | Accuracy (%) |
|
|---|---|---|---|---|---|---|
| ML classifiers | KNN | 95.06 | 95.93 | 95.49 | 97.33 | 95.15 |
| SVM | 94.86 | 94.81 | 94.84 | 97.77 | 94.97 | |
| DT | 94.34 | 96.72 | 95.52 | 96.12 | 95.66 | |
| RF | 95.09 | 97.23 | 96.21 | 96.03 | 96.03 | |
| Bagging | Bagging SVM | 95.43 | 94.99 | 95.17 | 95.21 | 95.66 |
| Bagging DT | 95.53 | 97.01 | 96.26 | 98.17 | 95.79 | |
| Boosting | AdaBoost | 94.70 | 96.98 | 95.83 | 98.34 | 95.52 |
| GB | 95.03 | 97.02 | 96.16 | 98.72 | 95.87 | |
| XGBoost | 95.15 | 97.44 | 96.29 | 96.14 | 96.17 |
Fig. 4.
Graph for all classifier performance.
According to the findings of this study, all of the tested ML algorithms were capable of classifying mental stress with high accuracy, precision, recall, and F1-score. The highest accuracy was achieved by XGBoost with an accuracy of 96.17%, followed by RF with an accuracy of 96.03%. The algorithms that utilized ensemble learning techniques such as Bagging SVM, Bagging DT, AdaBoost, and GB outperformed single classifiers such as KNN, SVM, DT, and RF. This suggests that ensemble learning techniques may be more effective in handling the complexity and variability of the data.
The highest AUC was achieved by GB with an AUC of 98.72%, followed by AdaBoost with an AUC of 98.34%. This suggests that these algorithms are better at distinguishing between the two classes, i.e., stress and no stress conditions. The outcomes of this study suggest that the machine embedded with an ECG sensor, GSR sensor, and ST sensor is a useful tool for the classification of mental stress. The high accuracy and AUC values achieved by the tested ML algorithms indicate that this approach can potentially be used as a reliable and objective tool for mental stress detection and monitoring. Figure 5 shows a scatter plot graph for all performance measures with each other.
Fig. 5.
Scatter plot graph for all performance measures with each other.
Univariate analysis
The findings of various machine learning methods used for mental stress categorization using single features, known as univariate analysis, are shown in Table 4. For several signal characteristics such as ECG, GSR, and ST, the accuracy of each classifier is reported.
Table 4.
Results of different ML algorithms for mental stress classification using single feature (univariate analysis).
| Signal | Feature name | Classifier accuracy (in %) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| KNN | SVM | DT | RF | Bagging SVM | Bagging DT | AdaBoost | GB | XGBoost | ||
| ECG | IBI | 68.28 | 66.53 | 70.58 | 72.62 | 69.37 | 71.32 | 70.42 | 71.37 | 73.92 |
| SDNN | 82.53 | 79.31 | 82.89 | 84.55 | 80.79 | 84.02 | 81.22 | 82.76 | 85.92 | |
| SDSD | 59.83 | 59.67 | 59.98 | 60.32 | 60.09 | 60.41 | 59.84 | 60.07 | 60.23 | |
| RMSSD | 74.39 | 72.28 | 76.29 | 79.02 | 74.22 | 78.31 | 75.21 | 78.91 | 80.32 | |
| pNN20 | 62.03 | 59.35 | 63.29 | 65.32 | 61.97 | 64.44 | 62.48 | 64.12 | 65.38 | |
| pNN50 | 63.64 | 60.27 | 64.28 | 65.90 | 62.52 | 64.78 | 64.28 | 65.28 | 67.23 | |
| MAD | 55.23 | 54.41 | 57.12 | 60.37 | 55.86 | 58.90 | 60.26 | 62.49 | 67.34 | |
| SD | 65.28 | 62.48 | 67.25 | 69.36 | 64.21 | 69.20 | 67.49 | 69.34 | 71.23 | |
| SD1 | 73.34 | 71.32 | 75.25 | 77.14 | 73.20 | 76.51 | 73.40 | 75.32 | 78.84 | |
| SD2 | 70.27 | 69.02 | 72.38 | 73.21 | 70.34 | 73.29 | 70.21 | 71.48 | 74.25 | |
| SD1/SD2 | 49.08 | 49.14 | 49.02 | 49.18 | 49.11 | 49.07 | 48.87 | 49.23 | 49.06 | |
| GSR | FT | 72.58 | 71.92 | 70.21 | 76.45 | 73.12 | 71.89 | 75.21 | 76.78 | 77.32 |
| DWT | 69.32 | 67.09 | 70.76 | 71.87 | 68.92 | 73.45 | 72.54 | 73.23 | 75.89 | |
| Mean | 75.14 | 73.21 | 74.76 | 76.34 | 74.45 | 75.56 | 73.89 | 75.12 | 77.01 | |
| Variance | 70.98 | 68.45 | 72.34 | 71.56 | 69.23 | 73.12 | 70.87 | 72.65 | 74.09 | |
| Kurtosis | 67.23 | 66.12 | 68.76 | 69.45 | 66.89 | 71.32 | 67.76 | 69.92 | 70.67 | |
| Skewness | 73.56 | 71.34 | 74.01 | 73.12 | 72.52 | 75.45 | 74.21 | 75.09 | 76.78 | |
| EM | 68.32 | 64.78 | 67.45 | 69.23 | 65.89 | 68.56 | 66.92 | 67.21 | 69.01 | |
| ST | Mean | 73.58 | 72.12 | 72.89 | 74.56 | 73.32 | 72.76 | 74.21 | 74.89 | 75.67 |
| Min–Max | 68.21 | 67.45 | 67.89 | 69.32 | 68.76 | 68.01 | 68.45 | 68.92 | 70.23 | |
| Range | 72.76 | 70.89 | 71.32 | 73.56 | 72.45 | 71.98 | 70.25 | 72.67 | 74.12 | |
| Variance | 75.23 | 72.78 | 74.45 | 75.92 | 74.67 | 74.21 | 74.32 | 75.56 | 77.01 | |
| SD | 70.89 | 68.32 | 70.01 | 71.45 | 70.67 | 70.12 | 69.56 | 70.76 | 72.09 | |
| RMS | 74.56 | 72.89 | 73.45 | 74.54 | 74.21 | 73.67 | 73.12 | 74.45 | 76.09 | |
| Median | 71.23 | 69.76 | 70.56 | 71.98 | 70.76 | 70.32 | 70.09 | 71.56 | 73.01 | |
| MAD | 69.45 | 67.67 | 68.92 | 69.76 | 69.01 | 68.56 | 68.12 | 69.32 | 70.78 | |
| MSAD | 72.32 | 69.67 | 71.98 | 73.23 | 72.09 | 71.56 | 71.01 | 72.45 | 74.01 | |
| AA | 68.75 | 65.12 | 68.45 | 69.67 | 68.92 | 68.32 | 68.89 | 69.45 | 70.67 | |
| WL | 71.56 | 68.86 | 70.76 | 72.21 | 71.32 | 70.76 | 70.45 | 71.98 | 73.23 | |
| ZCR | 70.12 | 67.45 | 69.67 | 70.89 | 70.23 | 69.76 | 69.32 | 70.67 | 72.12 | |
| AC | 73.01 | 70.21 | 72.32 | 73.67 | 72.45 | 71.98 | 70.23 | 72.76 | 74.01 | |
Here, XGBoost consistently achieved the maximum accuracy across several features. This implies that XGBoost is a trustworthy ML method for detecting mental stress based on single features. RF and GB fared well in a variety of settings, indicating their utility in this classification job.
The accuracy percentages for ECG characteristics such as IBI, SDNN, and RMSSD varied between classifiers. However, the ML algorithms achieved relatively high accuracy for these ECG characteristics in general. In terms of GSR characteristics, the mean value performed consistently across many ML classifiers. Other GSR features, such as FT and DWT, achieved high accuracy percentages as well. The accuracy percentages for ST features vary based on the feature and ML classifier. However, features such as Mean, Variance, and RMS performed well in categorizing mental stress. The results of this table highlight the significance of specific properties for each signal type. The SDNN characteristic from ECG, the mean value from GSR, and the variance from ST, for example, were all highly useful for mental stress classification.
Evaluation of benchmark datasets
To further validate the robustness and generalization of the proposed model, we evaluated it on two publicly available benchmark datasets named the SWELL-KW dataset and the WESAD dataset using XGBoost, the best-performing classifier from our study. As there are some differences in the signals available across datasets, we used only the common signals (ECG and EDA/GSR) for this comparison. The preprocessing and feature extraction procedures were standardized to ensure consistency across datasets. The results are summarized in Table 5 demonstrating that the proposed model performs consistently well across datasets, achieving higher accuracy comparable to prior studies. The comparison highlights the adaptability of the proposed model to diverse physiological signals and stressors.
Table 5.
Performance of proposed model on benchmark datasets.
| Dataset | Signals used | Participants | Stressors used | XGBoost accuracy (%) |
|---|---|---|---|---|
| SWELL-KW19 | ECG, EDA/GSR | 25 | Cognitive stress in workplace tasks | 92.38 |
| WESAD20 | ECG, EDA/GSR | 15 | Baseline, stress, meditation | 94.21 |
| Our collected dataset | ECG, GSR, ST | 200 | Watch funny videos, arithmetic tests, listen to favorite music, stroop color word test | 96.17 |
The SWELL-KW dataset, collected in a workplace scenario, provides a different context, with cognitive stress as the key focus. Despite the changes in stressor types and a smaller sample size, the model attained an accuracy of 92.38%, demonstrating its adaptability in a variety of situations. The WESAD dataset includes stress-inducing scenarios with physiological signals captured at baseline, meditation, and stress levels. The model obtained 94.21% accuracy on WESAD, demonstrating its flexibility for multimodal stress detection tasks. The results show that the proposed model performs well across datasets with varying participant population sizes, stress-inducing situations, and signals. While accuracy on benchmark datasets decreases slightly due to smaller sample sizes and diverse stressor types, the model still performs competitively, indicating its robustness.
In addition to evaluating our proposed model’s robustness, the inclusion of the above benchmark datasets was strategically chosen due to their compatibility with the physiological signals used in this study. There were relatively few public datasets available which included these two using the same three signals as the proposed model. The model’s strong performance across SWELL-KW (92.38%) and WESAD (94.21%) not only validates the reliability but also demonstrates its adaptability to diverse datasets. This dual approach of model validation ensures a balance of originality and reproducibility, hence increasing the proposed system’s generalizability.
Performance comparison with state-of-the-art models
To validate the effectiveness of our proposed study, we have compared its performance and other important factors with some previous state-of-the-art models which are described in Table 6.
Table 6.
Comparison of the proposed study with state-of-the-art models.
| References | Dataset size | Sensors used | Stressors | No. of features | Algorithms achieved the highest accuracy |
Highest accuracy achieved (%) |
Key features |
|---|---|---|---|---|---|---|---|
| 5 | 14 | Pulse sensor | Mental arithmetic task, stroop word color test | 19 | SVM | 94.33 | Focused on PPG signals and ultra-short-term recordings |
| 10 | 8 | ECG, GSR, ST, breathing rate, | Physical stress, mental stress, combined stress | 56 | Bagged trees | 94.7 | Focused on combining physical and mental stress detection using ML |
| 11 | 74 | ECG | Public speaking, arithmetic task, horror movie, | 26 | SVM | 90.5 | Focused on ultra-short-term HRV analysis using various time lengths |
| 14 | 20 | ECG | 12-h laboratory task | 13 | Decision tree | 93.30 | Focuses on novel use of smart T-shirts for real-time ECG acquisition |
| 34 | 20 | ECG, voice, facial expressions | MIST Experiments | – | ResNet50 I3D with TAM | 85.1 | Focuses on Creating a real-time deep learning system that effectively integrates multiple modalities |
| 35 | 32 | Pulse wave data | Mental arithmetic tasks | 64 | SVM | 95.26 | Analyzes Pulse Rate Variability multi-domain features to monitor different time lengths where time and frequency domain features perform best with 3 min optimal for stress |
| 36 | 20 | BPV, EDA, ST, Facial expression, auditory data | Everyday workrelated tasks | – | Deep Neural Network Model | 94 | The model processes data efficiently for various time windows, resulting in accurate stress detection and classification performance |
| 37 | 27 | ECG, PPG, EDA, Seismocardiogram (SCG), ballistocardiogram (BCG), and respiratory effort | Mental arithmetic, N-back, stroop color test | – | C-VI (Collective Variational Inference) method | 85 | The model Uses prominent digital signatures, establishing a priori probability density functions to improve classification accuracy |
| 38 | 15 | PPG, GSR | Backward arithmetic subtraction, stroop color word test | 14 | RF | 80 ± 8.31 | RF classifier utilized Gini optimization to enhance prediction accuracy |
| Proposed study | 200 | ECG, GSR, ST | Watch funny videos, arithmetic test, listen to favorite music, stroop color word test | 31 | XGBoost | 96.17 | Focuses Low-cost, multimodal signal integration with robust univariate and multivariate analysis |
Study limitations and future work
This study presents a novel framework for mental stress detection using multimodal wearable sensors and machine learning techniques, still, several limitations should be acknowledged. First, although the dataset contains 200 participants which is larger than those used in many previous studies, expanding the dataset would improve the generalizability of the model. Additionally, different stressors can be utilized in this study because the previously used stressors may not fully capture the variety of real-world stress conditions, limiting the findings’ direct relevance. Furthermore, our model’s performance in real-time settings remains untested where environmental noise and artifacts from movement and external factors may influence sensor readings.
To address these limitations, future work should explore the collection and use of larger and more diverse datasets to ensure broader robustness and generalizability. Also, more physiological signals like EEG, respiration, blood oxygen levels, or other advanced wearable devices could provide richer data for the improvement of stress detection accuracy and reliability. Longitudinal studies evaluating the model’s performance over extended periods and under varying conditions are also essential to ensure its adaptability in real-world applications. Developing and integrating advanced interpretability models (e.g., LIME or SHAP) would make the model’s decisions more clear and available for non-technical users, including healthcare professionals and caregivers. While the proposed system is computationally effective, additional efforts can focus on more optimization of the model for low-power and resource-constrained environments for flawless application in wearable devices. By addressing these areas, the proposed study can cover the way to evolving a more flexible, accurate, and user-friendly system for mental stress detection and management.
Also, while just two public benchmark datasets were chosen for external validation, the selection was driven by the absence of publicly available datasets including the required physiological signals under a variety of stress-inducing situations. The SWELL-KW and WESAD datasets were chosen because they contain high-quality ECG and GSR/EDA signals acquired using commercial-grade sensors, which are consistent with the collected data used in this study. Furthermore, the newly collected dataset of 200 participants, enhanced with four different stressors, makes up for this constraint by providing a bigger and more diverse data source than is generally obtainable. This not only strengthens the training but also adds a valuable new resource to the stress detection research community.
Future studies will focus on implementing real-time testing on larger participants to improve the practical usability of the proposed stress detection approach. This will involve gathering live physiological signals from users using our device in dynamic, real-world settings. Real-time validation will be useful in evaluating model performance in the presence of environmental noise, participant movement, and different stressors. Also, this will enable researchers to optimize the model for real-time stress monitoring applications, ensuring its robustness and adaptability.
Real-time evaluation of the proposed stress detection system
To validate the real-time performance of the proposed system, a real-time pilot study was conducted on 10 new students using the developed device and the same four stressors. Then the physiological signals were collected in real-time and processed through the trained XGBoost model to predict the mental stress state instantly. Table 7 shows the actual and predicted labels for all 10 participants and the accuracy and average confidence score for all four tasks. Out of a total of 40 task samples, 39 were correctly classified, resulting in a real-time classification accuracy of 97.5% and average prediction confidence of 95.3%. A single misclassification occurred during the Stroop test for 1 participant (P5), where the model predicted a no-stress state despite the stress-inducing task. This could be a result of the participant’s delayed or unusual physiological reaction patterns, highlighting the natural variation in human stress responses, which is a well-known problem in real-time bio-signal systems. Despite this small variation, the results show that the system can classify mental stress quickly, accurately, and reliably in a real-world setting. The average latency from signal acquisition to classification was about 1.2 s, indicating that the system is suitable for real-time applications such as student evaluation, workplace wellness, and cognitive workload evaluation.
Table 7.
Real-time classification results for 10 new participants using all four tasks.
| Participants | Tasks | Actual label | Predicted label | Accuracy (%) | Avg confidence (%) |
|---|---|---|---|---|---|
| P1 | Arithmetic test | Stress | Stress | 100 | 96.3 |
| Stroop test | Stress | Stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress | |||
| P2 | Arithmetic test | Stress | Stress | 100 | 95.9 |
| Stroop test | Stress | Stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress | |||
| P3 | Arithmetic test | Stress | Stress | 100 | 96.5 |
| Stroop test | Stress | Stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress | |||
| P4 | Arithmetic test | Stress | Stress | 100 | 96.8 |
| Stroop test | Stress | Stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress | |||
| P5 | Arithmetic test | Stress | Stress | 75 | 86.4 |
| Stroop test | Stress | No stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress | |||
| P6 | Arithmetic test | Stress | Stress | 100 | 95.8 |
| Stroop test | Stress | Stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress | |||
| P7 | Arithmetic test | Stress | Stress | 100 | 96.7 |
| Stroop test | Stress | Stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress | |||
| P8 | Arithmetic test | Stress | Stress | 100 | 96.4 |
| Stroop test | Stress | Stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress | |||
| P9 | Arithmetic test | Stress | Stress | 100 | 95.5 |
| Stroop test | Stress | Stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress | |||
| P10 | Arithmetic test | Stress | Stress | 100 | 96.0 |
| Stroop test | Stress | Stress | |||
| Watching funny video | No stress | No stress | |||
| Listening favorite music | No stress | No stress |
Conclusion
The present study has proposed a new framework that integrates data from wearable physiological signals with a diverse range of bagging and boosting algorithms to accurately detect mental stress. The proposed model is characterized by its high performance, automation, and relatively lower time and computation complexity. Additionally, the study has created a comprehensive and reliable dataset of 200 participants, which was generated by subjecting them to four different stressors, allowing for the proper training and evaluation of the proposed model. By leveraging a low-cost sensor device, such as the Arduino microcontroller and various sensors (e.g., ECG, GSR, and ST sensors), the proposed model can be an effective and accessible tool for detecting mental stress. Notably, the study’s findings revealed that the RF and XGBoost algorithms with an accuracy of 96.03% and 96.17% were the most effective among the nine different ML algorithms used to classify mental stress. The univariate analysis also demonstrated that XGBoost consistently obtained good accuracy across many single features, highlighting its dependability in detecting mental stress. RF and GB also performed well in a variety of settings. Specific features, such as SDNN for ECG, mean for GSR, and variance for ST, were found to be significant for each signal type. Therefore, it is evident that the proposed model, which integrates low-cost wearable sensors and ML algorithms, has promising prospects for accurately detecting mental stress. Furthermore, the current study’s contributions to the field of mental health research cannot be overstated. Furthermore, validation against benchmark datasets (SWELL-KW and WESAD) demonstrated the model’s adaptability and generalizability with 92.38% and 94.21% accuracy respectively. Furthermore, a real-time pilot test involving 10 new participants gives live prediction accuracy of 97.5% with low latency (~ 1.2 s), validating the practical viability of the proposed model for measuring mental stress in real-world settings. The proposed model represents an automatic and objective approach to detecting mental stress, which can facilitate early intervention and treatment, ultimately improving mental health and well-being. In conclusion, the present study’s findings underscore the immense potential of the proposed framework to serve as an invaluable tool for the detection of mental stress. Deep learning techniques will be utilized in the future to increase the system’s efficiency.
Acknowledgements
We would like to thank all of the RIMS, Ranchi, and BIT, Mesra, Ranchi participants for their cooperation on this project.
Author contributions
S.G. Writing original draft, Conceptualization and Methodology. S.D. Review of draft and editing, Supervision R.J. Review of draft and editing, Co-Supervision.
Funding
The authors report no funding.
Data availability
Data for this research paper is available upon request. Please contact the corresponding author for access.
Declarations
Competing interests
The authors declare no competing interests.
Ethical approval
All procedures conducted in studies that involve human participants were conducted in compliance with the ethical standards of the institutional research committee. The study was also conducted in accordance with the 1964 Helsinki Declaration, as well as its later amendments, or comparable ethical standards.
Informed consent
Informed consent was obtained from all individual participants included in the study. The participants were given clear and straightforward information on the nature of the study, its objectives, potential risks, and advantages, and they were given a chance to ask any queries they had before recording their data.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.McEwen, B. S. Central effects of stress hormones in health and disease: Understanding the protective and damaging effects of stress and stress mediators. Eur. J. Pharmacol.583(2–3), 174–185 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Betts Razavi, T. Self-Report Measures: An Overview of Concerns and Limitations of Questionnaire Use in Occupational Stress Research. University of Southampton - Department of Accounting and Management Science, Papers (2001).
- 3.Gedam, S. & Paul, S. A review on mental stress detection using wearable sensors and machine learning techniques. IEEE Access9, 84045–84066. 10.1109/ACCESS.2021.3085502 (2021). [Google Scholar]
- 4.McEwen, B. S. Physiology and neurobiology of stress and adaptation: central role of the brain. Physiol. Rev.87(3), 873–904. 10.1152/physrev.00041.2006 (2007). [DOI] [PubMed] [Google Scholar]
- 5.Zubair, M. & Yoon, C. Multilevel mental stress detection using ultra-short pulse rate variability series. Biomed. Signal Process. Control57, 101736. 10.1016/j.bspc.2019.101736 (2020). [Google Scholar]
- 6.Affanni, A. Wireless sensors system for stress detection by means of ECG and EDA acquisition. Sensors20, 2026. 10.3390/s20072026 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mozos, O. M. et al. Stress detection using wearable physiological and sociometric sensors. Int. J. Neural Syst.10.1142/S0129065716500416 (2016). [DOI] [PubMed] [Google Scholar]
- 8.Li, R. & Liu, Z. Stress detection using deep neural networks. BMC Med. Inform. Decis. Mak.20(Suppl 11), 285. 10.1186/s12911-020-01299-4 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Tiwari, S. & Agarwal, S. A shrewd artificial neural network-based hybrid model for pervasive stress detection of students using galvanic skin response and electrocardiogram signals. Big Data.9(6), 427–442. 10.1089/big.2020.0256 (2021). [DOI] [PubMed] [Google Scholar]
- 10.Umer, W. Simultaneous monitoring of physical and mental stress for construction tasks using physiological measures. J. Build. Eng.1(46), 103777. 10.1016/j.jobe.2021.103777 (2022). [Google Scholar]
- 11.Lee, S. et al. Mental stress assessment using ultra short term HRV analysis based on non-linear method. Biosensors12(7), 465. 10.3390/bios12070465 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kim, N., Lee, S., Kim, J., Choi, S. Y. & Park, S. M. Shuffled ECA-Net for stress detection from multimodal wearable sensor data. Comput. Biol. Med.183, 109217. 10.1016/j.compbiomed.2024.109217 (2024). [DOI] [PubMed] [Google Scholar]
- 13.Bin Heyat, M. B. et al. Wearable flexible electronics based cardiac electrode for researcher mental stress detection system using machine learning models on single lead electrocardiogram signal. Biosensors12(6), 427. 10.3390/bios12060427 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lee, J., Lee, H. & Shin, M. Driving stress detection using multimodal convolutional neural networks with nonlinear representation of short-term physiological signals. Sensors (Basel)21(7), 2381. 10.3390/s21072381 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Panicker, S. S. & Gayathri, P. A survey of machine learning techniques in physiology based mental stress detection systems. Biocybern. Biomed. Eng.39(1), 341–355 (2019). [Google Scholar]
- 16.Spielberger, C. D. State-Trait Anxiety Inventory for Adults (Sampler Set, Mind Garden, Palo Alto, CA, 1983). [Google Scholar]
- 17.Cohen, S. & Williamson, G. Perceived stress in a probability sample of the United States. In The social psychology of health: Claremont symposium on applied social psychology (eds Spacapan, S. & Oskamp, S.) 31–67 (Sage, 1988). [Google Scholar]
- 18.Koldijk, S., Sappelli, M., Verberne, S., Neerincx, M. A. & Kraaij, W. The SWELL Knowledge Work Dataset for Stress and User Modeling Research. in Proceedings of the 16th International Conference on Multimodal Interaction (ICMI 14) 291–298 (Association for Computing Machinery, New York, NY, USA, 2014). 10.1145/2663204.2663257.
- 19.Schmidt, P., Reiss, A., Duerichen, R., Marberger, C. & Van Laerhoven, K. Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection. in Proceedings of the 20th ACM International Conference on Multimodal Interaction (ICMI 18) 400–408 (Association for Computing Machinery, New York, NY, USA, 2018). 10.1145/3242969.3242985.
- 20.Rachakonda, L., Mohanty, S. P., Kougianos, E. & Sundaravadivel, P. Stress-lysis: A DNN-integrated edge device for stress level detection in the IoMT. IEEE Trans. Conum. Electron.65(4), 474–483 (2019). [Google Scholar]
- 21.Rachakonda, L., Sundaravadivel, P., Mohanty, S. P., Kougianos, E. & Ganapathiraju, M. A Smart Sensor in the IoMT for Stress Level Detection in Proceedings of the 4th IEEE International Symposium on Smart Electronic Systems (iSES) 141–145 (2018).
- 22.Langlois, D., Chartier, S. & Gosselin, D. An introduction to independent component analysis: InfoMax and FastICA algorithms. Tutor. Quant. Methods Psychol.6(1), 31–38 (2010). [Google Scholar]
- 23.Pham, T., Lau, Z. J., Chen, S. H. A. & Makowski, D. Heart rate variability in psychology: A review of HRV indices and an analysis tutorial. Sensors21(12), 3998. 10.3390/s21123998 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shimomura, Y. et al. Use of frequency domain analysis of skin conductance for evaluation of mental workload. J. Physiol. Anthropol.27, 173–177. 10.2114/jpa2.27.173 (2008). [DOI] [PubMed] [Google Scholar]
- 25.Hu, L. Y. et al. The distance function effect on k-nearest neighbor classification for medical datasets. Springerplus5(1), 1304. 10.1186/s40064-016-2941-7 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bishop, C. M. Pattern recognition and machine learning (Springer, 2006). [Google Scholar]
- 27.Hastie, T., Tibshirani, R. & Friedman, J. H. The elements of statistical learning: data mining, inference, and prediction 2nd edn. (Springer, 2009). [Google Scholar]
- 28.Breiman, L. Random forests. Mach. Learn.45(1), 5–32. 10.1023/A:1010933404324 (2001). [Google Scholar]
- 29.Kim, H. C., Pang, S., Je, H. M., Kim, D. & Bang, S. Y. Support vector machine ensemble with bagging. In Pattern Recognition with Support Vector Machines: First International Workshop, SVM 2002 Niagara Falls, Canada, August 10, 2002 Proceedings (eds Schoelkopf, B. et al.) 397–408 (Springer, 2002). [Google Scholar]
- 30.Dietterich, T. G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn.40, 139–157 (2000). [Google Scholar]
- 31.Schapire, R. E. Explaining adaboost. In: Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik. 37–52 (2013).
- 32.Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat.1, 1189–1232 (2001). [Google Scholar]
- 33.Chen, T., Guestrin, C. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 785–794 (2016).
- 34.Zhang, J. et al. Real-time mental stress detection using multimodality expressions with a deep learning framework. Front. Neurosci.16, 947168. 10.3389/fnins.2022.947168 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Jiao, Y. et al. Feasibility study for detection of mental stress and depression using pulse rate variability metrics via various durations. Biomed. Signal Process. Control79, 104145 (2023). [Google Scholar]
- 36.Dogan, G. & Akbulut, F. P. Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress. Neural Comput. Appl.35, 24435–24454 (2023). [Google Scholar]
- 37.Zhou, Y. et al. Inference-enabled tracking of acute mental stress via multi-modal wearable physiological sensing: A proof-of-concept study. Biocybern. Biomed. Eng.44(4), 771–781 (2024). [Google Scholar]
- 38.Barik, S. et al. Detection of stress from PPG and GSR signals using AI framework. J. Inst. Eng. India Ser. B10.1007/s40031-024-01191-z (2025). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data for this research paper is available upon request. Please contact the corresponding author for access.







