Analyzing mental stress in Indian students through advanced machine learning and wearable technologies

Shruti Gedam; Sandip Dutta; Ritesh Jha

doi:10.1038/s41598-025-06918-6

. 2025 Jul 1;15:20610. doi: 10.1038/s41598-025-06918-6

Analyzing mental stress in Indian students through advanced machine learning and wearable technologies

Shruti Gedam ^1,^✉, Sandip Dutta ¹, Ritesh Jha ¹

PMCID: PMC12214839 PMID: 40595085

Abstract

Mental stress is a prevalent issue in modern society, and detecting and classifying it accurately is crucial for effective interventions and treatment plans. This study aims to compare various machine learning (ML) algorithms for detecting mental stress using wearable physiological signal data and proposes a novel model that is automatic, high-performing, low-cost, and with lower time and computation complexity. The proposed model was trained and tested on a dataset of 200 participants, which involves applying four different stressors. Nine ML algorithms were investigated for both multivariate and univariate features. The physiological data was collected using a novel device developed using an Arduino microcontroller and low-cost sensors such as ECG, GSR, and ST sensors. The findings reveal that the suggested model detects mental stress with an accuracy of 96.17%, with the XGBoost method outperforming other algorithms in multivariate analysis. Univariate feature analysis found that XGBoost regularly demonstrated good accuracy, showing its dependability for detecting mental stress. The novel device created using low-cost sensors and automatic, high-performing algorithms is an effective and accessible tool for mental stress detection. Additionally, benchmark dataset validation (SWELL-KW, WESAD) confirmed the model’s robustness with accuracies of 92.38% and 94.21% respectively. A real-time pilot test on ten new participants utilizing the developed device validated the model’s practical value, with 97.5% classification accuracy and low latency. This study provides insights into the most effective ML algorithms for mental stress detection and creates a comprehensive and reliable resource for future research.

Keywords: ECG, GSR, Machine learning, Mental stress, Physiological signal, ST

Subject terms: Health care, Engineering

Introduction

The normal feeling of difficulty in dealing with specific tasks and conditions is stress. Stress is your body’s reaction to a challenge or demand. In brief moments, stress can be useful, such as when it helps you avoid danger or achieve a deadline. When stress lasts for a prolonged period, it can develop into a chronic condition if no steps are made to manage it, and it can have an impact on your health¹. Stress, by creating mind–body modifications, directly leads to psychological and physiological disorders and disease, as well as affects mental and physical health and degrades the quality of life. Everyone’s reaction to stress differs; the same event can be distressing for one person but not for another. Because for some people, just thinking about any challenging work might trigger a stressful situation, there is no obvious explanation for why one person feels less stressed than another when confronted with an identical stressor. There are several ways to measure a person’s mental stress. One of the most common methods is through self-report measures, where individuals report their level of stress using standardized questionnaires. However, self-report measures can be subjective and may not provide an accurate representation of the actual level of stress². Another way to measure mental stress is through physiological measures. This involves monitoring the body’s response to stress through various physiological signals such as heart rate, electroencephalogram (EEG), electrodermal activity (EDA), and event-related potentials (ERP). These signals are typically measured using specialized devices such as heart rate monitors, EEG machines, and skin conductance sensors.

In recent years, ML and artificial intelligence techniques have also been applied to detect and measure mental stress. These techniques involve using algorithms to analyze physiological signals and patterns in behavior to identify signs of stress. Overall, measuring mental stress can be a complex and multidimensional process, and a combination of different methods may provide the most accurate assessment of an individual’s stress level. Electrocardiogram (ECG), Galvanic Skin Response (GSR), and skin temperature (ST) are all physiological measures that are sensitive to changes in autonomic nervous system (ANS) activity and are frequently used in mental stress detection studies³. In reaction to internal and external stimuli, the ANS regulates the body’s internal processes such as heart rate, blood pressure, and sweating. When an individual is under stress, their ANS responds by activating the sympathetic nervous system (SNS), which prepares the body for a reaction of flight or fight. This activation causes a number of physiological changes, including a rise in heart rate, vasoconstriction, and sweating⁴.

The ECG is used to evaluate the electrical activity of the heart and can detect variations in heart rate and heart rate variability (HRV) that occur during stress. GSR detects changes in the electrical conductivity of the skin, which can be affected by changes in sweat gland activity. Changes in blood flow and sweat gland activity also affect ST, which can be used to recognize variations in SNS activity. Researchers can gain insights into the physiological changes that occur during stress by measuring changes in ECG, GSR, and ST, and develop methods to identify and manage stress in real-life situations.

The following are the study’s unique contributions:

This study proposes a novel ML framework for detecting mental stress using wearable physiological signal data and various Bagging and boosting algorithms, providing insights into the most effective algorithm for mental stress detection.
The experimental results demonstrate the performance of the RF and XGBoost algorithms to be the most effective of the nine ML algorithms tested, with 96.03% and 96.17% accuracy, respectively.
Individual physiological signals (ECG, GSR, and ST) are examined using univariate analysis to find unique patterns and features related to mental stress. The multivariate analysis takes into account all signals at the same time, exploiting their combined impact and interactions to improve stress detection accuracy.
To train and test the suggested model, this study developed a new and larger dataset of 200 participants. The dataset was created by using four distinct effective stressors, making it a comprehensive and dependable resource for mental stress detection research.

Related work

The use of wearable sensors for mental stress detection has been an active research area for some years. Numerous studies have investigated the usage of different physiological signals and ML algorithms for detecting mental stress using wearable sensors. Wearable sensors have become increasingly popular for monitoring physiological data for detecting mental stress. Among the different types of physiological data, ECG, GSR, and ST are some of the most commonly used signals due to their importance in assessing cardiovascular health, emotional arousal, and thermal regulation, respectively. Here, we present a detailed related work section, discussing the most related and current studies in this field.

Literature review

Zubair Muhammad and Yoon Changwoo⁵ selected 14 volunteers and collected data with a commercial Pulse sensor. The approach extracts characteristics from PPG signals and offers a new set of features to quantify temporal information. Using SVM, the proposed method classified five different levels of mental stress with an accuracy of 94.33%. Also, the system was tested on a different stressor dataset, demonstrating its capacity to detect diverse mental stress states employing ultra-short-term recordings from a cost-effective PPG sensor. This research demonstrates the suggested system’s capacity to detect and quantify mental stress levels. The study⁶ proposes a multi-sensor approach for detecting stress using physiological and sociometric sensors. The physiological sensors include ECG, respiration, and skin conductance, while the sociometric sensor is a wearable microphone. The authors collected data from 25 participants who were subjected to two stress-inducing tasks. They used several ML algorithms, including SVM, KNN, and decision tree, to classify stress. The results showed that combining physiological and sociometric sensors led to better stress classification compared to using physiological sensors alone. The best accuracy achieved was 87% using SVM. The study’s limitations included the small sample size, the limited number of stressors used, and the use of only healthy participants. The study⁷ involved 15 participants, and the sensors used were EDA sensors and ECG sensors for data collection and stress detection. The participants performed a stress-inducing task of public speaking, and their physiological data were collected during the task. The data was then used to extract features, which were fed into ML algorithms to classify stress levels. The system achieved an accuracy of 75.9% for stress level classification. The study also highlighted the potential of incorporating sociometric data, such as facial expressions and voice patterns, to improve stress detection accuracy. The study’s disadvantages include a small number of participants and the need for additional research to determine the system’s efficacy in real-world scenarios.

The study⁸ presents a novel method for detecting stress by combining deep neural networks (DNNs) with ECG and EDA. The authors collected physiological data from 50 participants who performed different stress-inducing tasks, including public speaking and mental arithmetic. They then preprocessed the data and trained three DNN models with different architectures using various combinations of ECG and EDA signals. The authors achieved an overall accuracy of 87.3% for stress detection with the best-performing model using both ECG and EDA signals. The study suggests that DNNs can be a promising approach to accurately detect stress using physiological signals, and the proposed method can be used in real-world settings for stress monitoring and management. The study⁹ proposes a hybrid model for detecting stress using real-time data analytics and the Internet of Things. The study focuses on the use of GSR and ECG sensors to monitor the physiological parameters of 34 participants while they undertake five different tasks designed to induce stress. The study computes the accuracy of different ML models, including Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Bagging Classifiers, Random Forest, Gradient Boosting, and Artificial Neural Network, to classify the mental state of the participants into four categories: relaxed, stressed, partially stressed, and happy. The hybrid model used a synthetic minority oversampling technique to deal with the imbalance class problem and achieved a high accuracy rate of 99.4% on the self-generated dataset. The study emphasizes the possibility of applying real-time data analytics to improve the quality of healthcare services, such as stress detection and diagnosis. Wearable sensors and machine learning algorithms have shown promising results for detecting mental stress. However, limitations such as small sample sizes and limited stressors used in studies need to be addressed.

This study¹⁰ focuses on the feasibility of tracking both physical and mental stress in construction workers using physiological signals and machine learning. The study emphasizes the necessity for a thorough stress evaluation, emphasizing the interrelationship of physical and mental stress. Data were acquired from 8 volunteers who wore a multi-sensor vest that assessed their heart rate, skin temperature, breathing rate, and skin conductance while performing physical and mental tasks. ML algorithms accurately classified stress levels, achieving up to 94.7% for simultaneous monitoring with bagged trees. The results showed that integrating physiological signals and applying person-specific normalization considerably enhanced prediction ability. Lee et al.¹¹ proposed a model using ultra-short-term HRV analysis with EMD-derived features to detect mental stress. The 26 features were evaluated using data from 74 police officers who were exposed to acute stressors such as the Trier Social Stress Test (TSST) and horror movie screenings. Using an SVM classifier, the approach obtained an accuracy of up to 90.5%. According to this study, 2–3 min of HRV data are adequate for reliable stress detection, indicating the possibility for real-time applications in wearable devices. The study ¹² presents Shuffled ECA-Net, a deep learning model for stress detection that uses multimodal wearable sensors to measure ECGs, respiratory waveforms, and electrogastrograms (EGGs) with their own developed device. Stress levels were identified using salivary cortisol to ensure objectivity. The model uses a unique Shuffled Efficient Channel Attention module to improve sensor fusion by considering inter-modality interactions. When evaluated with five-fold cross-validation, it outperformed baseline models (accuracy: 0.916, AUROC: 0.964). While multimodal fusion enhanced stress detection, the model’s generalizability was limited by a small 26-participant dataset and low inter-subject performance. The work focuses on practical, non-intrusive stress monitoring, with the potential for real-world applications and future optimization. This study¹³ develops an automatic technique for detecting mental stress in researchers using single-lead ECG readings acquired via a wearable smart T-shirt. Data from 20 male researchers was collected, with 1,800 min of ECG data divided into 1-min intervals. Decision Tree (DT) models outperformed the others, obtaining 93.30% accuracy for intra-subject classification and 94.10% for inter-subject classification. 3 HRV features were extracted. Flexible dry electrodes enabled pleasant and quick data collecting. The method has exceeded previous approaches, demonstrating its promise for real-time, non-invasive stress monitoring in high-risk settings. The study¹⁴ introduces a novel method for detecting driving stress that employs nonlinear representations of short-term physiological variables (30 s or less) and multimodal convolutional neural networks (CNNs). The method transforms GSR (hand and foot) and HR data into continuous recurrence plots (Cont-RPs), which are then analyzed by CNNs for stress-related characteristics. These features are integrated to create a representation vector for categorization. When tested on a real-world driving dataset of nine people, the methodology obtained 95.67% accuracy for 30-s signals and 92.33% for 10-s signals, exceeding earlier methods and proving efficacy even with short-term data. This emphasizes the possibility of real-time stress detection in driving scenarios.

Despite these promising results, more study is needed to enhance the detection mechanism indicated in these studies. Some studies, for example, have highlighted the need for more diverse and larger datasets to improve algorithm robustness and generalizability^3,15. This study proposes a new approach using physiological data collected through wearable sensors. It will involve a larger and more diverse participant population and use a variety of stressors to improve generalizability. The approach has the potential to improve stress management and monitoring in real-world settings.

Research gap

While the previous studies made significant advances in the area of detection of mental stress using wearable sensors and ML techniques, there are still some gaps that are worth highlighting to further advance this research field:

In most of the studies, the dataset size is less than 100 participants which restricts the robustness and generalizability of their findings due to small datasets. This paper addresses this research gap by using a dataset of 200 participants, which is larger than those in many existing works, and also collected four diverse stress-inducing scenarios to improve the performance of the model.
Prior studies often utilize a limited number of stressors which do not adequately represent the variety of real-world stress conditions which makes it difficult to generalize models to real-life applications. This study includes four different stressors for data variability and generating more data points.
While some previous work utilizes physiological signals like ECG or EDA, only a few studies examine the combined impact of multiple signals such as ECG, GSR, and ST. In this study, this multimodel approach has been applied to provide richer information and improve stress detection accuracy. Also, the univariate and multivariate analysis is made to focus on the importance of using multimodels for better accuracy.
Many studies rely on high-cost wearable devices for data collection which may limit their practicality and scalability in real-world settings, especially in low-resourse environments. This study proposes a low-cost and computationally efficient system integrating wearable sensors and ensemble ML techniques for the enhancement of accessibility and practicality.

Experimental protocols

This section outlines the experimental procedures used to gather data on mental stress using an IoT device that included ECG, GSR, and ST sensors. Real-time monitoring of the physiological reactions associated with stress was made possible by the development of this system and the selection of certain sensors.

IoT device and sensors (setup & placement)

A device has been developed that utilizes three sensors, namely an ECG sensor (Heart Rate monitor AD 3282), a GSR module (Grove GSR_sensor), and a ST sensor (DS18B20 temperature sensor). The device is depicted in Figs. 1(a) and 1(b). Data collection was performed utilizing this device while adhering to the study protocol outlined in Fig. 2 and a participant wearing all the sensors is shown in Fig. 1(c). In addition to the aforementioned sensors, the device comprises other components such as a USB power supply (to provide power to the device), a Real-Time Clock (RTC) module (for timekeeping), a Micro SD card module (for data storage), an Arduino Mega and UNO (for sensor connections), and a TFT LCD display (for real-time display of ECG, GSR, and ST values). The ECG sensor is three electrodes or pads that are placed on the chest in specific locations to detect the electrical signals from the heart. The GSR sensor was placed on the fingers of the hand. When applying the GSR sensor to the fingers, the position must be on the palmar surface of the distal joint of the index and middle fingers. This is the area of the finger that has the highest density of sweat glands, which makes it the most sensitive location for measuring the changes in skin conductance. The DS18B20 temperature sensor was put under the armpit of the participants. This is a suitable location because it is easy to access, and it provides a relatively stable and consistent temperature measurement. To place the sensor under the armpit, it is essential to ensure that the sensor is in contact with the skin and that the sensor tip is positioned in the center of the armpit.

Fig. 1 — (a) IoT device with ECG sensor, GSR module, and ST sensor. (b) Prototype and components of the IoT device, (c) a participant wearing IoT device during stressor test [A—ECG sensor, B—GSR sensor, C—ST sensor, D—IoT device for collecting data].

Participants and study protocol

The purpose of this study is to evaluate stress responses in a controlled environment, using a diverse group of participants to ensure that the findings are generalizable. The study approach was designed to accurately capture physiological and psychological stress markers by recruiting healthy volunteers and conducting standardized stress-inducing tasks. Details on the participants and the study protocol are provided below.

Participants details

For this experiment, 200 undergraduate students (128 males and 72 females) were recruited. 120 subjects were from Rajendra Institute of Medical Sciences (RIMS), Ranchi, India, and 80 subjects were from Birla Institute of Technology (BIT), Mesra, Ranchi, India. The participants were all in excellent health and varied in age from 20 to 26 years. None of the participants had a chronic condition such as a cardiovascular or mental disorder. Following an explanation of the experiment’s goal and protocol, participants were screened using a self-report questionnaire. Personal information such as name, age, gender, dominant hand, height, weight, blood group, education, hometown place, annual family income, and the number of siblings (excluding them) was acquired and considered prior to data collection utilizing a self-report questionnaire. For twelve hours before the data-collecting procedure, all participants were asked not to consume any medicine, alcohol, coffee, or tea. The participants were instructed not to speak and limited their movement during the exam and were seated in front of two computers in a comfortable room, one for completing stressor activities and the other for monitoring the timer for each stressor. They also completed two standard questionnaires, the State-Trait Anxiety Inventory (STAI)¹⁶ and the Perceived Stress Scale with 10 questions (PSS-10)¹⁷.

The participants’ socioeconomic backgrounds are diverse, as shown by a self-reported questionnaire that asked about family income, education, hometown, and other socioeconomic factors. The gender distribution is 128 males (64%) and 72 females (36%), providing a reasonable gender balance. The age range of 20 to 26 years is typical of young people, who are an important group in stress research, particularly for academic pressures. However, while this sample provides useful insights into stress detection in young adults, the findings may not be directly applicable to other age groups, such as older adults or children, or to populations with distinct health profiles, such as those with pre-existing diseases. The questionnaires were employed for additional demographic and psychological profiling. Although the dataset prioritizes a specific age group (as it specifically focuses on Indian Students) and demographic, it was created to replicate real-life mental stress scenarios using four different stress-inducing exercises. This controlled environment ensures the accuracy of the acquired data while also providing a solid platform for stress classification research.

Study protocol details

The four stressors are used in this study to induce stress in participants which is illustrated in Fig. 2. They are a combination of psychological, physical, and environmental stressors intended to provide a comprehensive evaluation of the participant’s response to stress. Each stressor’s 2.5-min length guaranteed consistency between trials, allowing for precise observation of physiological and psychological stress markers. The ground truth of this study is described in section "Ground truth". To minimize environmental noise, data collection was conducted in a calm laboratory setting. Participants were comfortably situated in front of a screen, with little movement to ensure optimal signal acquisition. Each session which included one participant’s physiological signal data at a time lasted about 10 min, and data was acquired for four stress-inducing tasks:

Watch funny videos- It is a non-stressful hobby that displays physiological signals during relaxation. It is defined as both psychological and environmental stressors.
Arithmetic test- It puts participants under cognitive stress by challenging them to answer mathematical problems under time limitations. It is a psychological stressor that increases cognitive load and causes stress due to the pressure of performance and time constraints.
Listen to Favorite Music- This activity was meant to provide psychological and environmental relief by involving participants in a calming and enjoyable activity that evaluated recovery and relaxation benefits. Music affects emotional states and may prevent stress-induced changes.
Strrop Color Word Test – This test challenged participants to identify the ink color of words spelling out different colors, requiring both automatic and careful reading. It is a psychological stressor that assesses attention, cognitive control, and intrusion management skills.

Ethical approval statement

The experimental protocols used in this study were approved by the Department of CSE, Birla Institute of Technology (BIT), Mesra, Ranchi, India (Approval No: CSE/HoD/Certificate/2023–24/164). All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed consent statement

Informed consent was obtained from all individual participants included in the study for the publication of identifying information/images in an online open-access publication. For the participant image provided in Fig. 1(c), specific informed consent was obtained from the participant for the use of his image in this publication.

Device validation methodology

The reliability of the developed device was validated by comparing its measurements for ECG and GSR signals with the benchmark datasets specifically SWELL-KW¹⁸ and WESAD¹⁹, which were collected using commercial-grade sensors. The SWELL-KW dataset does not include ST data so only the WESAD dataset’s ST sensor data was used along with the stress lysis dataset^20,21 for validating the ST data. In Table 1, the comparison results are summarised with strong alignment between the developed device’s measurements and the benchmark datasets, validating its reliability for physiological data collection.

Table 1.

Comparision details about the device for validation.

Signal	Parameter	Developed device (Mean ± SD)	SWELL-KW (Mean ± SD)	WESAD (Mean ± SD)	Deviation	Commercial device names
Signal	Parameter	Developed device (Mean ± SD)	SWELL-KW (Mean ± SD)	WESAD (Mean ± SD)	Deviation	SWELL-KW	WESAD
ECG	IBI (ms)	935–1024	910–1050	900–1100	< 5%	MOBI device (TMSI)	RespiBan professional
GSR	Skin Conductance (µS)	3.42 ± 0.15	~ 3.40 ± 0.14	~ 3.50 ± 0.18	< 3%	MOBI device (TMSI)	RespiBan professional
ST	Skin Temperature (°C)	34.1 ± 0.3	N.A	33°-36°	< 2%	–	RespiBan professional

Open in a new tab

SWELL-KW’s MOBI device (TMSI) is a research-grade tool for ECG and GSR measurements, with high physiological data acquisition accuracy. The RespiBAN Professional, used at WESAD, is a chest-worn multisensory device capable of recording ECG, GSR, and temperature data with high precision. The ST sensor readings (mean ± SD: 34.1 ± 0.3 °C) matched the normative ranges (33–37 °C) reported in the Stress Lysis dataset. ECG IBI readings (< 5% variance), GSR readings (< 3% deviation), and ST readings (< 2% deviation) are well aligned with commercial-grade sensors from SWELL-KW and WESAD datasets, indicating the device’s reliability in providing physiological data.

Benchmark datasets

To validate the reliability of the developed device and assess the generalization of the proposed framework, two publicly available benchmark datasets namely SWELL-KW and WESAD were utilized. These datasets contain physiological signals recorded using commercial devices under controlled stress-inducing conditions which can be used as standard references for stress classification models. These datasets assist in comparing the accuracy and dependability of our custom-developed wearable device to commercial-grade sensors. The comparison results are shown in Table 1.

SWELL-KW dataset

This dataset consists of physiological and behavioral data from 25 participants performing knowledge-based work tasks under the following three conditions.

Neutral- It was a baseline condition that included regular working conditions with no stressors.
Time pressure- It was a stress condition in which the task difficulty and deadlines were increased to induce cognitive stress.
Interruptions- It was also a stress condition where frequent external disturbances were created during regular tasks to stimulate workplace distractions.

This dataset consists of ECG and GSR signals collected using the MOBI device (TSMI) which is a high-precision research-grade wearable sensor. This dataset does not include ST measurements.

WESAD dataset

This dataset includes physiological recordings from 15 participants undergoing three experimental conditions.

Baseline- Data was recorded when the participant was in normal resting condition.
Stress- Data was recorded using TSST when the participant was exposed to public speaking and mental arithmetic tasks.
Meditation- Data was recorded when the participant was doing guided breathing exercises for relaxation.

The dataset consists of ECG, GSR, and ST signals collected using the RespiBAN Professional device which is a commercial wearable device.

Methodology

The dataset used in this study was prepared by inducing stressors to the participants for detecting stress and includes data from 128 males and 72 females. The procedure of methodology is illustrated in Fig. 3.

Fig. 3 — Flowchart of mental stress detection model used in this study.

Preprocessing of dataset

Eleven HRV features were extracted from ECG data and then filtered using a notch filter with a cutoff frequency of 0.05. Seven GSR features were obtained from the GSR signal and then filtered using a low-pass Butterworth filter before computing the first derivative of the filtered signal. Additionally, the Discrete Wavelet Transform (DWT) was applied to degrade the signal into its frequency components, called approximation and detail coefficients due to the non-stationary nature of the GSR signal. Thirteen ST features were extracted and then the Butterworth filter was used with a lower frequency cutoff of 5 Hz and a sampling rate of 1000 Hz.

Feature selection and computation

Feature selection and computation were done using multivariate and univariate analysis. The list of extracted features from all the sensors used in this study is described in Table 2.

Table 2.

List of extracted features from all the sensors.

Physiological signal source	Types of features	Features	Feature description
ECG	Time-domain	IIBI	Inter beat interval
		SDNN	Standard deviation of intervals between adjacent beats
		SDSD	Standard deviation of successive differences between adjacent R-R intervals
		RMSSD	Root mean square of successive differences between adjacent R-R intervals
		pNN20	Proportion of differences between R-R intervals greater than 20 ms
		pNN50	Proportion of differences between R-R intervals greater than 50 ms
	Statistical	MAD	Median absolute deviation
	Non-linear	SD	Poincare analysis, SD
		SD1	Poincare analysis, SD1
		SD2	Poincare analysis, SD2
		SD1/SD2	Poincare analysis, SD1/SD2
GSR	Frequency-Domain	FT	Fourier Transform
	Frequency-Domain	DWT	Discrete wavelet transform
	Statistical	Mean	Mean
		Variance	Variance
		Kurtosis	Kurtosis
		Skewness	Skewness
	Energy-related	EM	Energy measure
ST	Statistical	Mean	Mean
		Variance	Variance
		SD	Standard deviation
		RMS	Root mean square
		Median	Median
		MAD	Mean absolute deviation
	Range-based	Min–Max	Min–max
	Range-based	Range	Range
	Derived	MSAD	Mean square of the approximate 1st and 2nd derivatives
		AA	Absolute Area
		WL	Waveform length
		ZCR	Zero-crossing rate, number of crossings
		AC	Average value of absolute derivative

Open in a new tab

Feature selection for multivariate analysis

For multivariate analysis, features from all three physiological signals (ECG, GSR, and ST) were combined in order to take advantage of their beneficial nature. By combining features from several signals, the multivariate technique provides a comprehensive perspective of stress-induced physiological changes, allowing for robust and reliable stress detection.

Feature extraction- It was applied to ECG, GSR, and ST data and 31 features were extracted based on their physiological relevance and sensitivity to stress response.
Pre-processing- FastICA algorithm²² has been applied to the input data which attempted to find the independent components that were most informative for the given problem. Here, the extracted components were then used as features for training and testing the ML model where 75% data was used for training and the other 25% for testing. Then Standard Scaler is used to scale the features of the training and test data for the modalities (ECG, GSR, and ST) to have a mean of 0 and a standard deviation of 1. This is done before training a ML model on the data, to expand the performance of the model.

Feature selection for univariate analysis

Features were chosen differently for each physiological signal based on their capacity to capture important physiological responses to stress within each signal type. Here, each feature was evaluated for its sensitivity to stress-related changes within its corresponding physiological signal. The selected features were utilized to train ML models separately for each signal type, allowing stress classification based on univariate data.

ECG features- Time-domain features were selected to indicate variations in heart rate and ANS activity while statistical features increased robustness by capturing signal variability²³.
GSR features- Frequency-domain features can quantify variations in skin conductance frequency²⁴. Also, Statistical metrics and energy-related features provided essential new dimensions for retrieving electrodermal activity variations during stress.
ST features- Statistical features, range-based metrics, and derived features captured thermal regulation dynamics influenced by stress.

Analysis methods

K-nearest neighbors (KNN)

Using a similarity metric such as Euclidean distance, KNN²⁵ determines the K nearest data points to a specific test data point. The system then predicts the class of the test data point based on the overwhelming class of its K nearest neighbors. This approach does not make any assumptions about the underlying data distribution and can handle non-linear correlations between input and output variables. The disadvantage of KNN is that it can be computationally expensive for large datasets because the method must compute the distance between each test and training data point. KNN is also sensitive to the value of K and the distance metric utilized, both of which can have an effect on prediction accuracy.

Support vector machine (SVM)

SVM²⁶ is a ML algorithm that finds the hyperplane with the greatest margin between positive and negative input points. The hyperplane is defined by a weight vector w and a bias term b, and the distance between data point x and hyperplane is measured as:

SVM solves an optimization problem to find the hyperplane that maximizes the margin while minimizing the classification error. The optimization problem can be stated in two ways:

where,

a_i are the Lagrange multipliers
y_i and y_j are the class labels for data points x_i and x_j, respectively
x_i ⋅ x_j is the dot product of the feature vectors x_i and x_j
C is a hyperparameter

Decision tree (DT)

In classification, the DT algorithm divides data into two subsets and uses impurity criteria to determine the best feature for each split. Common factors include Gini impurity and cross-entropy. The algorithm constructs a tree, with internal nodes serving as feature tests and leaves serving as class identifiers. It selects the best feature to divide the data and then repeats the process for each subset before building the tree. Based on the input feature values, the DT predicts the class label by spanning the tree from root to leaf²⁷.

Random forest (RF)

RF is an ensemble learning algorithm that combines multiple decision trees to enhance classification accuracy and robustness. To prevent overfitting and increase variety, each tree in the forest is trained on a randomly selected portion of its training data and a random subset of the input features. The final prediction is obtained by aggregating all of the trees’ predictions, either by majority vote or by averaging the probabilities. RF can also determine the significance of each input feature based on the reduction in impurity obtained by using that feature in the trees. This feature significance can be used for feature selection or model interpretation²⁸.

Bagging SVM

Bagging SVM is ML ensemble algorithm that combines the power of SVMs with the bagging concept. Bagging is a technique that involves training numerous models on different subsets of data and combining their results to make a final prediction. In Bagging SVM, multiple models based on SVM have been trained on various sets of data, and their outputs are merged via averaging or voting to get the final prediction. This method enhances the model’s robustness and generalizability while decreasing prediction variance. By combining the predictions of numerous SVM models, Bagging SVM improves the performance of a single SVM model²⁹. In this study, an SVM classifier is built with an RBF kernel and regularization parameter C = 1. Then created a bagging classifier with 10 base classifiers, each using 70% of the training data, and used the SVM classifier as the base estimator.

Bagging DT

The Bagging DT algorithm³⁰ creates several DTs from different training data samples and feature subsets. These decision trees are then combined to create a powerful learner that determines the final classification. By generating a diverse set of decision trees, the bagging procedure serves to reduce overfitting. The Bagging DT algorithm uses majority voting or weighted voting to combine the outcomes of the individual decision trees. The weighted voting algorithm is shown below:

where f(x) is the predicted class label for input x, ti(x) is the i^th decision tree’s forecasted class label, also wi is a weight given to the ith decision tree based on its presentation on the validation set.

AdaBoost

AdaBoost is a boosting algorithm for classification that combines multiple weak classifiers to produce a strong classifier. The algorithm trains weak classifiers iteratively on various subsets of the training data, with higher weights assigned to misclassified samples from earlier iterations. The final classifier is a weighted sum of the weak classifiers, with the weights decided by the classification accuracy of the weak classifiers. The AdaBoost method can be formulated as an optimization problem of minimizing the exponential loss function, with the weights updated using the AdaBoost update equation at each iteration³¹.

Gradient boosting (GB)

Gradient boosting is a ML algorithm that combines multiple weak classifiers to produce a strong classifier. The algorithm repeatedly trains weak classifiers on various subsets of the training data, updating the weights given to misclassified samples based on the gradient of a loss function at each iteration. The final classifier is a weighted sum of the weak classifiers, with the weights decided by the contribution of the weak classifiers to the overall loss function³².

Extreme gradient boosting (XGBoost)

XGBoost is an ensemble ML algorithm that improves the efficacy of binary classification models by using gradient boosting. Iteratively, XGBoost builds a sequence of decision trees, each one trained to rectify the errors of the previous tree. The gradient descent method is used by XGBoost to minimize a loss function that is the sum of the loss function evaluated for each instance and the regularization term. The type of binary classification issue being solved, such as binary cross-entropy or logistic loss, can influence the loss function³³.

Ground truth

The ground truth for this study is the binary organization of the data as either stress or no stress. During the experiment, participants were subjected to four different stressors, and their physiological responses were recorded using wearable sensors, including ECG, GSR, and ST sensors. The data acquired during stressful conditions were labeled as stress data, while data acquired during non-stressful conditions were categorized as no-stress data. This binary labeling allowed for the differentiation and analysis of physiological responses during periods of stress versus relaxation. The ground truth for this study was used to train and test the proposed model for detecting mental stress using wearable physiological signal data.

Model evaluation and performance measures

For each ML algorithm, models were trained using 75% of all subject data, with the remained data used for testing. Each method was also subjected to tenfold cross-validation. It is popular way because it provides more consistent approximations of model performance, reduces the risk of overfitting, and makes better use of available data.

Precision, recall, F1-score, AUC value, and accuracy were considered performance measures. Precision, recall, and accuracy are all essential measures when evaluating a mental stress detection system’s performance. High precision assures that only true stressful events are recognized, avoiding false positives and unnecessary interventions. High recall assures that all real stressful events are identified, avoiding false negatives that can result in missed intervention opportunities. High accuracy assures that the system is dependable and trustworthy, delivering precise outcomes that can be relied on for timely and effective interventions. Mental stress detection systems with high precision, recall, and accuracy can have significant beneficial impacts on individuals’ mental health and well-being. The AUC value provides an overall measure of the system’s ability to discriminate among stressful and non-stressful events, by computing the area under the receiver operating characteristic (ROC) curve. A high AUC value indicates a better ability of the system to differentiate between these events, which can lead to more accurate interventions and better mental health outcomes. Similarly, the F1-score is another critical performance measure that combines precision and recall measures. The F1-score balances precision and recall, guaranteeing that the system detects both authentic positives and true negatives while minimizing false positives and false negatives. A high F1-score indicates that the system can identify and differentiate between stressful and non-stressful events accurately, providing more effective and timely interventions, and resulting in better mental health outcomes for individuals.

By taking into consideration all of these performance indicators, one may ensure that the system can effectively identify and differentiate between stressful and non-stressful occurrences, allowing for timely and effective interventions that can have a major impact on a person’s psychological well-being and mental health.

Results and discussion

In this study, a novel machine embedded with an ECG, GSR, and ST sensor was used for data collection to classify mental stress in stress and no-stress conditions. The collected data was used to train and test nine different ML algorithms for the classification of mental stress.

Multivariate analysis

In this analysis, all the features are considered for training the classifiers. The results of the study are summarized in Table 3 and a graph for all classifier performance is shown in Fig. 4.

Table 3.

Results of different ML algorithms for mental stress classification using all features (multivariate analysis).

	Classifier	Precision	Recall	F1-Score	AUC	Accuracy (%)
ML classifiers	KNN	95.06	95.93	95.49	97.33	95.15
	SVM	94.86	94.81	94.84	97.77	94.97
	DT	94.34	96.72	95.52	96.12	95.66
	RF	95.09	97.23	96.21	96.03	96.03
Bagging	Bagging SVM	95.43	94.99	95.17	95.21	95.66
Bagging	Bagging DT	95.53	97.01	96.26	98.17	95.79
Boosting	AdaBoost	94.70	96.98	95.83	98.34	95.52
	GB	95.03	97.02	96.16	98.72	95.87
	XGBoost	95.15	97.44	96.29	96.14	96.17

Open in a new tab

Fig. 4 — Graph for all classifier performance.

According to the findings of this study, all of the tested ML algorithms were capable of classifying mental stress with high accuracy, precision, recall, and F1-score. The highest accuracy was achieved by XGBoost with an accuracy of 96.17%, followed by RF with an accuracy of 96.03%. The algorithms that utilized ensemble learning techniques such as Bagging SVM, Bagging DT, AdaBoost, and GB outperformed single classifiers such as KNN, SVM, DT, and RF. This suggests that ensemble learning techniques may be more effective in handling the complexity and variability of the data.

The highest AUC was achieved by GB with an AUC of 98.72%, followed by AdaBoost with an AUC of 98.34%. This suggests that these algorithms are better at distinguishing between the two classes, i.e., stress and no stress conditions. The outcomes of this study suggest that the machine embedded with an ECG sensor, GSR sensor, and ST sensor is a useful tool for the classification of mental stress. The high accuracy and AUC values achieved by the tested ML algorithms indicate that this approach can potentially be used as a reliable and objective tool for mental stress detection and monitoring. Figure 5 shows a scatter plot graph for all performance measures with each other.

Fig. 5 — Scatter plot graph for all performance measures with each other.

Univariate analysis

The findings of various machine learning methods used for mental stress categorization using single features, known as univariate analysis, are shown in Table 4. For several signal characteristics such as ECG, GSR, and ST, the accuracy of each classifier is reported.

Table 4.

Results of different ML algorithms for mental stress classification using single feature (univariate analysis).

Signal	Feature name	Classifier accuracy (in %)
Signal	Feature name	KNN	SVM	DT	RF	Bagging SVM	Bagging DT	AdaBoost	GB	XGBoost
ECG	IBI	68.28	66.53	70.58	72.62	69.37	71.32	70.42	71.37	73.92
	SDNN	82.53	79.31	82.89	84.55	80.79	84.02	81.22	82.76	85.92
	SDSD	59.83	59.67	59.98	60.32	60.09	60.41	59.84	60.07	60.23
	RMSSD	74.39	72.28	76.29	79.02	74.22	78.31	75.21	78.91	80.32
	pNN20	62.03	59.35	63.29	65.32	61.97	64.44	62.48	64.12	65.38
	pNN50	63.64	60.27	64.28	65.90	62.52	64.78	64.28	65.28	67.23
	MAD	55.23	54.41	57.12	60.37	55.86	58.90	60.26	62.49	67.34
	SD	65.28	62.48	67.25	69.36	64.21	69.20	67.49	69.34	71.23
	SD1	73.34	71.32	75.25	77.14	73.20	76.51	73.40	75.32	78.84
	SD2	70.27	69.02	72.38	73.21	70.34	73.29	70.21	71.48	74.25
	SD1/SD2	49.08	49.14	49.02	49.18	49.11	49.07	48.87	49.23	49.06
GSR	FT	72.58	71.92	70.21	76.45	73.12	71.89	75.21	76.78	77.32
	DWT	69.32	67.09	70.76	71.87	68.92	73.45	72.54	73.23	75.89
	Mean	75.14	73.21	74.76	76.34	74.45	75.56	73.89	75.12	77.01
	Variance	70.98	68.45	72.34	71.56	69.23	73.12	70.87	72.65	74.09
	Kurtosis	67.23	66.12	68.76	69.45	66.89	71.32	67.76	69.92	70.67
	Skewness	73.56	71.34	74.01	73.12	72.52	75.45	74.21	75.09	76.78
	EM	68.32	64.78	67.45	69.23	65.89	68.56	66.92	67.21	69.01
ST	Mean	73.58	72.12	72.89	74.56	73.32	72.76	74.21	74.89	75.67
	Min–Max	68.21	67.45	67.89	69.32	68.76	68.01	68.45	68.92	70.23
	Range	72.76	70.89	71.32	73.56	72.45	71.98	70.25	72.67	74.12
	Variance	75.23	72.78	74.45	75.92	74.67	74.21	74.32	75.56	77.01
	SD	70.89	68.32	70.01	71.45	70.67	70.12	69.56	70.76	72.09
	RMS	74.56	72.89	73.45	74.54	74.21	73.67	73.12	74.45	76.09
	Median	71.23	69.76	70.56	71.98	70.76	70.32	70.09	71.56	73.01
	MAD	69.45	67.67	68.92	69.76	69.01	68.56	68.12	69.32	70.78
	MSAD	72.32	69.67	71.98	73.23	72.09	71.56	71.01	72.45	74.01
	AA	68.75	65.12	68.45	69.67	68.92	68.32	68.89	69.45	70.67
	WL	71.56	68.86	70.76	72.21	71.32	70.76	70.45	71.98	73.23
	ZCR	70.12	67.45	69.67	70.89	70.23	69.76	69.32	70.67	72.12
	AC	73.01	70.21	72.32	73.67	72.45	71.98	70.23	72.76	74.01

Open in a new tab

Here, XGBoost consistently achieved the maximum accuracy across several features. This implies that XGBoost is a trustworthy ML method for detecting mental stress based on single features. RF and GB fared well in a variety of settings, indicating their utility in this classification job.

The accuracy percentages for ECG characteristics such as IBI, SDNN, and RMSSD varied between classifiers. However, the ML algorithms achieved relatively high accuracy for these ECG characteristics in general. In terms of GSR characteristics, the mean value performed consistently across many ML classifiers. Other GSR features, such as FT and DWT, achieved high accuracy percentages as well. The accuracy percentages for ST features vary based on the feature and ML classifier. However, features such as Mean, Variance, and RMS performed well in categorizing mental stress. The results of this table highlight the significance of specific properties for each signal type. The SDNN characteristic from ECG, the mean value from GSR, and the variance from ST, for example, were all highly useful for mental stress classification.

Evaluation of benchmark datasets

To further validate the robustness and generalization of the proposed model, we evaluated it on two publicly available benchmark datasets named the SWELL-KW dataset and the WESAD dataset using XGBoost, the best-performing classifier from our study. As there are some differences in the signals available across datasets, we used only the common signals (ECG and EDA/GSR) for this comparison. The preprocessing and feature extraction procedures were standardized to ensure consistency across datasets. The results are summarized in Table 5 demonstrating that the proposed model performs consistently well across datasets, achieving higher accuracy comparable to prior studies. The comparison highlights the adaptability of the proposed model to diverse physiological signals and stressors.

Table 5.

Performance of proposed model on benchmark datasets.

Dataset	Signals used	Participants	Stressors used	XGBoost accuracy (%)
SWELL-KW¹⁹	ECG, EDA/GSR	25	Cognitive stress in workplace tasks	92.38
WESAD²⁰	ECG, EDA/GSR	15	Baseline, stress, meditation	94.21
Our collected dataset	ECG, GSR, ST	200	Watch funny videos, arithmetic tests, listen to favorite music, stroop color word test	96.17

Open in a new tab

The SWELL-KW dataset, collected in a workplace scenario, provides a different context, with cognitive stress as the key focus. Despite the changes in stressor types and a smaller sample size, the model attained an accuracy of 92.38%, demonstrating its adaptability in a variety of situations. The WESAD dataset includes stress-inducing scenarios with physiological signals captured at baseline, meditation, and stress levels. The model obtained 94.21% accuracy on WESAD, demonstrating its flexibility for multimodal stress detection tasks. The results show that the proposed model performs well across datasets with varying participant population sizes, stress-inducing situations, and signals. While accuracy on benchmark datasets decreases slightly due to smaller sample sizes and diverse stressor types, the model still performs competitively, indicating its robustness.

In addition to evaluating our proposed model’s robustness, the inclusion of the above benchmark datasets was strategically chosen due to their compatibility with the physiological signals used in this study. There were relatively few public datasets available which included these two using the same three signals as the proposed model. The model’s strong performance across SWELL-KW (92.38%) and WESAD (94.21%) not only validates the reliability but also demonstrates its adaptability to diverse datasets. This dual approach of model validation ensures a balance of originality and reproducibility, hence increasing the proposed system’s generalizability.

Performance comparison with state-of-the-art models

To validate the effectiveness of our proposed study, we have compared its performance and other important factors with some previous state-of-the-art models which are described in Table 6.

Table 6.

Comparison of the proposed study with state-of-the-art models.

References	Dataset size	Sensors used	Stressors	No. of features	Algorithms achieved the highest accuracy	Highest accuracy achieved (%)	Key features
⁵	14	Pulse sensor	Mental arithmetic task, stroop word color test	19	SVM	94.33	Focused on PPG signals and ultra-short-term recordings
¹⁰	8	ECG, GSR, ST, breathing rate,	Physical stress, mental stress, combined stress	56	Bagged trees	94.7	Focused on combining physical and mental stress detection using ML
¹¹	74	ECG	Public speaking, arithmetic task, horror movie,	26	SVM	90.5	Focused on ultra-short-term HRV analysis using various time lengths
¹⁴	20	ECG	12-h laboratory task	13	Decision tree	93.30	Focuses on novel use of smart T-shirts for real-time ECG acquisition
³⁴	20	ECG, voice, facial expressions	MIST Experiments	–	ResNet50 I3D with TAM	85.1	Focuses on Creating a real-time deep learning system that effectively integrates multiple modalities
³⁵	32	Pulse wave data	Mental arithmetic tasks	64	SVM	95.26	Analyzes Pulse Rate Variability multi-domain features to monitor different time lengths where time and frequency domain features perform best with 3 min optimal for stress
³⁶	20	BPV, EDA, ST, Facial expression, auditory data	Everyday workrelated tasks	–	Deep Neural Network Model	94	The model processes data efficiently for various time windows, resulting in accurate stress detection and classification performance
³⁷	27	ECG, PPG, EDA, Seismocardiogram (SCG), ballistocardiogram (BCG), and respiratory effort	Mental arithmetic, N-back, stroop color test	–	C-VI (Collective Variational Inference) method	85	The model Uses prominent digital signatures, establishing a priori probability density functions to improve classification accuracy
³⁸	15	PPG, GSR	Backward arithmetic subtraction, stroop color word test	14	RF	80 ± 8.31	RF classifier utilized Gini optimization to enhance prediction accuracy
Proposed study	200	ECG, GSR, ST	Watch funny videos, arithmetic test, listen to favorite music, stroop color word test	31	XGBoost	96.17	Focuses Low-cost, multimodal signal integration with robust univariate and multivariate analysis

Open in a new tab

Study limitations and future work

This study presents a novel framework for mental stress detection using multimodal wearable sensors and machine learning techniques, still, several limitations should be acknowledged. First, although the dataset contains 200 participants which is larger than those used in many previous studies, expanding the dataset would improve the generalizability of the model. Additionally, different stressors can be utilized in this study because the previously used stressors may not fully capture the variety of real-world stress conditions, limiting the findings’ direct relevance. Furthermore, our model’s performance in real-time settings remains untested where environmental noise and artifacts from movement and external factors may influence sensor readings.

To address these limitations, future work should explore the collection and use of larger and more diverse datasets to ensure broader robustness and generalizability. Also, more physiological signals like EEG, respiration, blood oxygen levels, or other advanced wearable devices could provide richer data for the improvement of stress detection accuracy and reliability. Longitudinal studies evaluating the model’s performance over extended periods and under varying conditions are also essential to ensure its adaptability in real-world applications. Developing and integrating advanced interpretability models (e.g., LIME or SHAP) would make the model’s decisions more clear and available for non-technical users, including healthcare professionals and caregivers. While the proposed system is computationally effective, additional efforts can focus on more optimization of the model for low-power and resource-constrained environments for flawless application in wearable devices. By addressing these areas, the proposed study can cover the way to evolving a more flexible, accurate, and user-friendly system for mental stress detection and management.

Also, while just two public benchmark datasets were chosen for external validation, the selection was driven by the absence of publicly available datasets including the required physiological signals under a variety of stress-inducing situations. The SWELL-KW and WESAD datasets were chosen because they contain high-quality ECG and GSR/EDA signals acquired using commercial-grade sensors, which are consistent with the collected data used in this study. Furthermore, the newly collected dataset of 200 participants, enhanced with four different stressors, makes up for this constraint by providing a bigger and more diverse data source than is generally obtainable. This not only strengthens the training but also adds a valuable new resource to the stress detection research community.

Future studies will focus on implementing real-time testing on larger participants to improve the practical usability of the proposed stress detection approach. This will involve gathering live physiological signals from users using our device in dynamic, real-world settings. Real-time validation will be useful in evaluating model performance in the presence of environmental noise, participant movement, and different stressors. Also, this will enable researchers to optimize the model for real-time stress monitoring applications, ensuring its robustness and adaptability.

Real-time evaluation of the proposed stress detection system

To validate the real-time performance of the proposed system, a real-time pilot study was conducted on 10 new students using the developed device and the same four stressors. Then the physiological signals were collected in real-time and processed through the trained XGBoost model to predict the mental stress state instantly. Table 7 shows the actual and predicted labels for all 10 participants and the accuracy and average confidence score for all four tasks. Out of a total of 40 task samples, 39 were correctly classified, resulting in a real-time classification accuracy of 97.5% and average prediction confidence of 95.3%. A single misclassification occurred during the Stroop test for 1 participant (P5), where the model predicted a no-stress state despite the stress-inducing task. This could be a result of the participant’s delayed or unusual physiological reaction patterns, highlighting the natural variation in human stress responses, which is a well-known problem in real-time bio-signal systems. Despite this small variation, the results show that the system can classify mental stress quickly, accurately, and reliably in a real-world setting. The average latency from signal acquisition to classification was about 1.2 s, indicating that the system is suitable for real-time applications such as student evaluation, workplace wellness, and cognitive workload evaluation.

Table 7.

Real-time classification results for 10 new participants using all four tasks.

Participants	Tasks	Actual label	Predicted label	Accuracy (%)	Avg confidence (%)
P1	Arithmetic test	Stress	Stress	100	96.3
	Stroop test	Stress	Stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress
P2	Arithmetic test	Stress	Stress	100	95.9
	Stroop test	Stress	Stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress
P3	Arithmetic test	Stress	Stress	100	96.5
	Stroop test	Stress	Stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress
P4	Arithmetic test	Stress	Stress	100	96.8
	Stroop test	Stress	Stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress
P5	Arithmetic test	Stress	Stress	75	86.4
	Stroop test	Stress	No stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress
P6	Arithmetic test	Stress	Stress	100	95.8
	Stroop test	Stress	Stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress
P7	Arithmetic test	Stress	Stress	100	96.7
	Stroop test	Stress	Stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress
P8	Arithmetic test	Stress	Stress	100	96.4
	Stroop test	Stress	Stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress
P9	Arithmetic test	Stress	Stress	100	95.5
	Stroop test	Stress	Stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress
P10	Arithmetic test	Stress	Stress	100	96.0
	Stroop test	Stress	Stress
	Watching funny video	No stress	No stress
	Listening favorite music	No stress	No stress

Open in a new tab

Conclusion

The present study has proposed a new framework that integrates data from wearable physiological signals with a diverse range of bagging and boosting algorithms to accurately detect mental stress. The proposed model is characterized by its high performance, automation, and relatively lower time and computation complexity. Additionally, the study has created a comprehensive and reliable dataset of 200 participants, which was generated by subjecting them to four different stressors, allowing for the proper training and evaluation of the proposed model. By leveraging a low-cost sensor device, such as the Arduino microcontroller and various sensors (e.g., ECG, GSR, and ST sensors), the proposed model can be an effective and accessible tool for detecting mental stress. Notably, the study’s findings revealed that the RF and XGBoost algorithms with an accuracy of 96.03% and 96.17% were the most effective among the nine different ML algorithms used to classify mental stress. The univariate analysis also demonstrated that XGBoost consistently obtained good accuracy across many single features, highlighting its dependability in detecting mental stress. RF and GB also performed well in a variety of settings. Specific features, such as SDNN for ECG, mean for GSR, and variance for ST, were found to be significant for each signal type. Therefore, it is evident that the proposed model, which integrates low-cost wearable sensors and ML algorithms, has promising prospects for accurately detecting mental stress. Furthermore, the current study’s contributions to the field of mental health research cannot be overstated. Furthermore, validation against benchmark datasets (SWELL-KW and WESAD) demonstrated the model’s adaptability and generalizability with 92.38% and 94.21% accuracy respectively. Furthermore, a real-time pilot test involving 10 new participants gives live prediction accuracy of 97.5% with low latency (~ 1.2 s), validating the practical viability of the proposed model for measuring mental stress in real-world settings. The proposed model represents an automatic and objective approach to detecting mental stress, which can facilitate early intervention and treatment, ultimately improving mental health and well-being. In conclusion, the present study’s findings underscore the immense potential of the proposed framework to serve as an invaluable tool for the detection of mental stress. Deep learning techniques will be utilized in the future to increase the system’s efficiency.

Acknowledgements

We would like to thank all of the RIMS, Ranchi, and BIT, Mesra, Ranchi participants for their cooperation on this project.

Author contributions

S.G. Writing original draft, Conceptualization and Methodology. S.D. Review of draft and editing, Supervision R.J. Review of draft and editing, Co-Supervision.

Funding

The authors report no funding.

Data availability

Data for this research paper is available upon request. Please contact the corresponding author for access.

Declarations

Competing interests

The authors declare no competing interests.

Ethical approval

All procedures conducted in studies that involve human participants were conducted in compliance with the ethical standards of the institutional research committee. The study was also conducted in accordance with the 1964 Helsinki Declaration, as well as its later amendments, or comparable ethical standards.

Informed consent

Informed consent was obtained from all individual participants included in the study. The participants were given clear and straightforward information on the nature of the study, its objectives, potential risks, and advantages, and they were given a chance to ask any queries they had before recording their data.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.McEwen, B. S. Central effects of stress hormones in health and disease: Understanding the protective and damaging effects of stress and stress mediators. Eur. J. Pharmacol.583(2–3), 174–185 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Betts Razavi, T. Self-Report Measures: An Overview of Concerns and Limitations of Questionnaire Use in Occupational Stress Research. University of Southampton - Department of Accounting and Management Science, Papers (2001).
3.Gedam, S. & Paul, S. A review on mental stress detection using wearable sensors and machine learning techniques. IEEE Access9, 84045–84066. 10.1109/ACCESS.2021.3085502 (2021). [Google Scholar]
4.McEwen, B. S. Physiology and neurobiology of stress and adaptation: central role of the brain. Physiol. Rev.87(3), 873–904. 10.1152/physrev.00041.2006 (2007). [DOI] [PubMed] [Google Scholar]
5.Zubair, M. & Yoon, C. Multilevel mental stress detection using ultra-short pulse rate variability series. Biomed. Signal Process. Control57, 101736. 10.1016/j.bspc.2019.101736 (2020). [Google Scholar]
6.Affanni, A. Wireless sensors system for stress detection by means of ECG and EDA acquisition. Sensors20, 2026. 10.3390/s20072026 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Mozos, O. M. et al. Stress detection using wearable physiological and sociometric sensors. Int. J. Neural Syst.10.1142/S0129065716500416 (2016). [DOI] [PubMed] [Google Scholar]
8.Li, R. & Liu, Z. Stress detection using deep neural networks. BMC Med. Inform. Decis. Mak.20(Suppl 11), 285. 10.1186/s12911-020-01299-4 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Tiwari, S. & Agarwal, S. A shrewd artificial neural network-based hybrid model for pervasive stress detection of students using galvanic skin response and electrocardiogram signals. Big Data.9(6), 427–442. 10.1089/big.2020.0256 (2021). [DOI] [PubMed] [Google Scholar]
10.Umer, W. Simultaneous monitoring of physical and mental stress for construction tasks using physiological measures. J. Build. Eng.1(46), 103777. 10.1016/j.jobe.2021.103777 (2022). [Google Scholar]
11.Lee, S. et al. Mental stress assessment using ultra short term HRV analysis based on non-linear method. Biosensors12(7), 465. 10.3390/bios12070465 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kim, N., Lee, S., Kim, J., Choi, S. Y. & Park, S. M. Shuffled ECA-Net for stress detection from multimodal wearable sensor data. Comput. Biol. Med.183, 109217. 10.1016/j.compbiomed.2024.109217 (2024). [DOI] [PubMed] [Google Scholar]
13.Bin Heyat, M. B. et al. Wearable flexible electronics based cardiac electrode for researcher mental stress detection system using machine learning models on single lead electrocardiogram signal. Biosensors12(6), 427. 10.3390/bios12060427 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Lee, J., Lee, H. & Shin, M. Driving stress detection using multimodal convolutional neural networks with nonlinear representation of short-term physiological signals. Sensors (Basel)21(7), 2381. 10.3390/s21072381 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Panicker, S. S. & Gayathri, P. A survey of machine learning techniques in physiology based mental stress detection systems. Biocybern. Biomed. Eng.39(1), 341–355 (2019). [Google Scholar]
16.Spielberger, C. D. State-Trait Anxiety Inventory for Adults (Sampler Set, Mind Garden, Palo Alto, CA, 1983). [Google Scholar]
17.Cohen, S. & Williamson, G. Perceived stress in a probability sample of the United States. In The social psychology of health: Claremont symposium on applied social psychology (eds Spacapan, S. & Oskamp, S.) 31–67 (Sage, 1988). [Google Scholar]
18.Koldijk, S., Sappelli, M., Verberne, S., Neerincx, M. A. & Kraaij, W. The SWELL Knowledge Work Dataset for Stress and User Modeling Research. in Proceedings of the 16th International Conference on Multimodal Interaction (ICMI 14) 291–298 (Association for Computing Machinery, New York, NY, USA, 2014). 10.1145/2663204.2663257.
19.Schmidt, P., Reiss, A., Duerichen, R., Marberger, C. & Van Laerhoven, K. Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection. in Proceedings of the 20th ACM International Conference on Multimodal Interaction (ICMI 18) 400–408 (Association for Computing Machinery, New York, NY, USA, 2018). 10.1145/3242969.3242985.
20.Rachakonda, L., Mohanty, S. P., Kougianos, E. & Sundaravadivel, P. Stress-lysis: A DNN-integrated edge device for stress level detection in the IoMT. IEEE Trans. Conum. Electron.65(4), 474–483 (2019). [Google Scholar]
21.Rachakonda, L., Sundaravadivel, P., Mohanty, S. P., Kougianos, E. & Ganapathiraju, M. A Smart Sensor in the IoMT for Stress Level Detection in Proceedings of the 4th IEEE International Symposium on Smart Electronic Systems (iSES) 141–145 (2018).
22.Langlois, D., Chartier, S. & Gosselin, D. An introduction to independent component analysis: InfoMax and FastICA algorithms. Tutor. Quant. Methods Psychol.6(1), 31–38 (2010). [Google Scholar]
23.Pham, T., Lau, Z. J., Chen, S. H. A. & Makowski, D. Heart rate variability in psychology: A review of HRV indices and an analysis tutorial. Sensors21(12), 3998. 10.3390/s21123998 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Shimomura, Y. et al. Use of frequency domain analysis of skin conductance for evaluation of mental workload. J. Physiol. Anthropol.27, 173–177. 10.2114/jpa2.27.173 (2008). [DOI] [PubMed] [Google Scholar]
25.Hu, L. Y. et al. The distance function effect on k-nearest neighbor classification for medical datasets. Springerplus5(1), 1304. 10.1186/s40064-016-2941-7 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Bishop, C. M. Pattern recognition and machine learning (Springer, 2006). [Google Scholar]
27.Hastie, T., Tibshirani, R. & Friedman, J. H. The elements of statistical learning: data mining, inference, and prediction 2nd edn. (Springer, 2009). [Google Scholar]
28.Breiman, L. Random forests. Mach. Learn.45(1), 5–32. 10.1023/A:1010933404324 (2001). [Google Scholar]
29.Kim, H. C., Pang, S., Je, H. M., Kim, D. & Bang, S. Y. Support vector machine ensemble with bagging. In Pattern Recognition with Support Vector Machines: First International Workshop, SVM 2002 Niagara Falls, Canada, August 10, 2002 Proceedings (eds Schoelkopf, B. et al.) 397–408 (Springer, 2002). [Google Scholar]
30.Dietterich, T. G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn.40, 139–157 (2000). [Google Scholar]
31.Schapire, R. E. Explaining adaboost. In: Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik. 37–52 (2013).
32.Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat.1, 1189–1232 (2001). [Google Scholar]
33.Chen, T., Guestrin, C. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 785–794 (2016).
34.Zhang, J. et al. Real-time mental stress detection using multimodality expressions with a deep learning framework. Front. Neurosci.16, 947168. 10.3389/fnins.2022.947168 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Jiao, Y. et al. Feasibility study for detection of mental stress and depression using pulse rate variability metrics via various durations. Biomed. Signal Process. Control79, 104145 (2023). [Google Scholar]
36.Dogan, G. & Akbulut, F. P. Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress. Neural Comput. Appl.35, 24435–24454 (2023). [Google Scholar]
37.Zhou, Y. et al. Inference-enabled tracking of acute mental stress via multi-modal wearable physiological sensing: A proof-of-concept study. Biocybern. Biomed. Eng.44(4), 771–781 (2024). [Google Scholar]
38.Barik, S. et al. Detection of stress from PPG and GSR signals using AI framework. J. Inst. Eng. India Ser. B10.1007/s40031-024-01191-z (2025). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data for this research paper is available upon request. Please contact the corresponding author for access.

[CR1] 1.McEwen, B. S. Central effects of stress hormones in health and disease: Understanding the protective and damaging effects of stress and stress mediators. Eur. J. Pharmacol.583(2–3), 174–185 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Betts Razavi, T. Self-Report Measures: An Overview of Concerns and Limitations of Questionnaire Use in Occupational Stress Research. University of Southampton - Department of Accounting and Management Science, Papers (2001).

[CR3] 3.Gedam, S. & Paul, S. A review on mental stress detection using wearable sensors and machine learning techniques. IEEE Access9, 84045–84066. 10.1109/ACCESS.2021.3085502 (2021). [Google Scholar]

[CR4] 4.McEwen, B. S. Physiology and neurobiology of stress and adaptation: central role of the brain. Physiol. Rev.87(3), 873–904. 10.1152/physrev.00041.2006 (2007). [DOI] [PubMed] [Google Scholar]

[CR5] 5.Zubair, M. & Yoon, C. Multilevel mental stress detection using ultra-short pulse rate variability series. Biomed. Signal Process. Control57, 101736. 10.1016/j.bspc.2019.101736 (2020). [Google Scholar]

[CR6] 6.Affanni, A. Wireless sensors system for stress detection by means of ECG and EDA acquisition. Sensors20, 2026. 10.3390/s20072026 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Mozos, O. M. et al. Stress detection using wearable physiological and sociometric sensors. Int. J. Neural Syst.10.1142/S0129065716500416 (2016). [DOI] [PubMed] [Google Scholar]

[CR8] 8.Li, R. & Liu, Z. Stress detection using deep neural networks. BMC Med. Inform. Decis. Mak.20(Suppl 11), 285. 10.1186/s12911-020-01299-4 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Tiwari, S. & Agarwal, S. A shrewd artificial neural network-based hybrid model for pervasive stress detection of students using galvanic skin response and electrocardiogram signals. Big Data.9(6), 427–442. 10.1089/big.2020.0256 (2021). [DOI] [PubMed] [Google Scholar]

[CR10] 10.Umer, W. Simultaneous monitoring of physical and mental stress for construction tasks using physiological measures. J. Build. Eng.1(46), 103777. 10.1016/j.jobe.2021.103777 (2022). [Google Scholar]

[CR11] 11.Lee, S. et al. Mental stress assessment using ultra short term HRV analysis based on non-linear method. Biosensors12(7), 465. 10.3390/bios12070465 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Kim, N., Lee, S., Kim, J., Choi, S. Y. & Park, S. M. Shuffled ECA-Net for stress detection from multimodal wearable sensor data. Comput. Biol. Med.183, 109217. 10.1016/j.compbiomed.2024.109217 (2024). [DOI] [PubMed] [Google Scholar]

[CR13] 13.Bin Heyat, M. B. et al. Wearable flexible electronics based cardiac electrode for researcher mental stress detection system using machine learning models on single lead electrocardiogram signal. Biosensors12(6), 427. 10.3390/bios12060427 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Lee, J., Lee, H. & Shin, M. Driving stress detection using multimodal convolutional neural networks with nonlinear representation of short-term physiological signals. Sensors (Basel)21(7), 2381. 10.3390/s21072381 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Panicker, S. S. & Gayathri, P. A survey of machine learning techniques in physiology based mental stress detection systems. Biocybern. Biomed. Eng.39(1), 341–355 (2019). [Google Scholar]

[CR16] 16.Spielberger, C. D. State-Trait Anxiety Inventory for Adults (Sampler Set, Mind Garden, Palo Alto, CA, 1983). [Google Scholar]

[CR17] 17.Cohen, S. & Williamson, G. Perceived stress in a probability sample of the United States. In The social psychology of health: Claremont symposium on applied social psychology (eds Spacapan, S. & Oskamp, S.) 31–67 (Sage, 1988). [Google Scholar]

[CR18] 18.Koldijk, S., Sappelli, M., Verberne, S., Neerincx, M. A. & Kraaij, W. The SWELL Knowledge Work Dataset for Stress and User Modeling Research. in Proceedings of the 16th International Conference on Multimodal Interaction (ICMI 14) 291–298 (Association for Computing Machinery, New York, NY, USA, 2014). 10.1145/2663204.2663257.

[CR19] 19.Schmidt, P., Reiss, A., Duerichen, R., Marberger, C. & Van Laerhoven, K. Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection. in Proceedings of the 20th ACM International Conference on Multimodal Interaction (ICMI 18) 400–408 (Association for Computing Machinery, New York, NY, USA, 2018). 10.1145/3242969.3242985.

[CR20] 20.Rachakonda, L., Mohanty, S. P., Kougianos, E. & Sundaravadivel, P. Stress-lysis: A DNN-integrated edge device for stress level detection in the IoMT. IEEE Trans. Conum. Electron.65(4), 474–483 (2019). [Google Scholar]

[CR21] 21.Rachakonda, L., Sundaravadivel, P., Mohanty, S. P., Kougianos, E. & Ganapathiraju, M. A Smart Sensor in the IoMT for Stress Level Detection in Proceedings of the 4th IEEE International Symposium on Smart Electronic Systems (iSES) 141–145 (2018).

[CR22] 22.Langlois, D., Chartier, S. & Gosselin, D. An introduction to independent component analysis: InfoMax and FastICA algorithms. Tutor. Quant. Methods Psychol.6(1), 31–38 (2010). [Google Scholar]

[CR23] 23.Pham, T., Lau, Z. J., Chen, S. H. A. & Makowski, D. Heart rate variability in psychology: A review of HRV indices and an analysis tutorial. Sensors21(12), 3998. 10.3390/s21123998 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Shimomura, Y. et al. Use of frequency domain analysis of skin conductance for evaluation of mental workload. J. Physiol. Anthropol.27, 173–177. 10.2114/jpa2.27.173 (2008). [DOI] [PubMed] [Google Scholar]

[CR25] 25.Hu, L. Y. et al. The distance function effect on k-nearest neighbor classification for medical datasets. Springerplus5(1), 1304. 10.1186/s40064-016-2941-7 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Bishop, C. M. Pattern recognition and machine learning (Springer, 2006). [Google Scholar]

[CR27] 27.Hastie, T., Tibshirani, R. & Friedman, J. H. The elements of statistical learning: data mining, inference, and prediction 2nd edn. (Springer, 2009). [Google Scholar]

[CR28] 28.Breiman, L. Random forests. Mach. Learn.45(1), 5–32. 10.1023/A:1010933404324 (2001). [Google Scholar]

[CR29] 29.Kim, H. C., Pang, S., Je, H. M., Kim, D. & Bang, S. Y. Support vector machine ensemble with bagging. In Pattern Recognition with Support Vector Machines: First International Workshop, SVM 2002 Niagara Falls, Canada, August 10, 2002 Proceedings (eds Schoelkopf, B. et al.) 397–408 (Springer, 2002). [Google Scholar]

[CR30] 30.Dietterich, T. G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn.40, 139–157 (2000). [Google Scholar]

[CR31] 31.Schapire, R. E. Explaining adaboost. In: Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik. 37–52 (2013).

[CR32] 32.Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat.1, 1189–1232 (2001). [Google Scholar]

[CR33] 33.Chen, T., Guestrin, C. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 785–794 (2016).

[CR34] 34.Zhang, J. et al. Real-time mental stress detection using multimodality expressions with a deep learning framework. Front. Neurosci.16, 947168. 10.3389/fnins.2022.947168 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Jiao, Y. et al. Feasibility study for detection of mental stress and depression using pulse rate variability metrics via various durations. Biomed. Signal Process. Control79, 104145 (2023). [Google Scholar]

[CR36] 36.Dogan, G. & Akbulut, F. P. Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress. Neural Comput. Appl.35, 24435–24454 (2023). [Google Scholar]

[CR37] 37.Zhou, Y. et al. Inference-enabled tracking of acute mental stress via multi-modal wearable physiological sensing: A proof-of-concept study. Biocybern. Biomed. Eng.44(4), 771–781 (2024). [Google Scholar]

[CR38] 38.Barik, S. et al. Detection of stress from PPG and GSR signals using AI framework. J. Inst. Eng. India Ser. B10.1007/s40031-024-01191-z (2025). [Google Scholar]

PERMALINK

Analyzing mental stress in Indian students through advanced machine learning and wearable technologies

Shruti Gedam

Sandip Dutta

Ritesh Jha

Abstract

Introduction

Related work

Literature review

Research gap

Experimental protocols

IoT device and sensors (setup & placement)

Fig. 1.

Fig. 2.

Participants and study protocol

Participants details

Study protocol details

Ethical approval statement

Informed consent statement

Device validation methodology

Table 1.

Benchmark datasets

SWELL-KW dataset

WESAD dataset

Methodology

Fig. 3.

Preprocessing of dataset

Feature selection and computation

Table 2.

Feature selection for multivariate analysis

Feature selection for univariate analysis

Analysis methods

K-nearest neighbors (KNN)

Support vector machine (SVM)

Decision tree (DT)

Random forest (RF)

Bagging SVM

Bagging DT

AdaBoost

Gradient boosting (GB)

Extreme gradient boosting (XGBoost)

Ground truth

Model evaluation and performance measures

Results and discussion

Multivariate analysis

Table 3.

Fig. 4.

Fig. 5.

Univariate analysis

Table 4.

Evaluation of benchmark datasets

Table 5.

Performance comparison with state-of-the-art models

Table 6.

Study limitations and future work

Real-time evaluation of the proposed stress detection system

Table 7.

Conclusion

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Competing interests

Ethical approval

Informed consent

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases