Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jun 2;20(6):e0323265. doi: 10.1371/journal.pone.0323265

Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models

Mohammad Junayed Hasan 1, Jannat Sultana 1, Silvia Ahmed 1, Sifat Momen 1,*
Editor: Ashad Kabir2
PMCID: PMC12129211  PMID: 40455805

Abstract

Occupational stress is a major concern for employers and organizations as it compromises decision-making and overall safety of workers. Studies indicate that work-stress contributes to severe mental strain, increased accident rates, and in extreme cases, even suicides. This study aims to enhance early detection of occupational stress through machine learning (ML) methods, providing stakeholders with better insights into the underlying causes of stress to improve occupational safety. Utilizing a newly published workplace survey dataset, we developed a novel feature selection pipeline identifying 39 key indicators of work-stress. An ensemble of three ML models achieved a state-of-the-art accuracy of 90.32%, surpassing existing studies. The framework’s generalizability was confirmed through a three-step validation technique: holdout-validation, 10-fold cross-validation, and external-validation with synthetic data generation, achieving an accuracy of 89% on unseen data. We also introduced a 1D-CNN to enable hierarchical and temporal learning from the data. Additionally, we created an algorithm to convert tabular data into texts with 100% information retention, facilitating domain analysis with large language models, revealing that occupational stress is more closely related to the biomedical domain than clinical or generalist domains. Ablation studies reinforced our feature selection pipeline, and revealed sociodemographic features as the most important. Explainable AI techniques identified excessive workload and ambiguity (27%), poor communication (17%), and a positive work environment (16%) as key stress factors. Unlike previous studies relying on clinical settings or biomarkers, our approach streamlines stress detection from simple survey questions, offering a real-time, deployable tool for periodic stress assessment in workplaces.

Introduction

The well-being and safety of the working population is a crucial concern in today’s modern society. Maintaining good health and ensuring workplace safety is crucial for individuals to lead productive and fulfilling lives, contributing to the overall prosperity of communities and nations. One key aspect of population health is occupational health [13], which refers to the physical, mental, and social well-being of individuals in relation to their work environments. Occupational stress [4, 5], a prevalent workplace safety hazard in various professions, can have detrimental effects on individuals’ health, impacting their productivity, job satisfaction, overall quality of life, and most critically, their workplace safety behavior. Studies have shown that stressed workers are significantly more likely to be involved in workplace accidents due to decreased attention, impaired decision-making, and compromised safety protocol adherence [6, 7]. Literatures also indicate that prolonged exposure to heavy stress reduces life expectancy by 2.8 years [8]. For middle-aged individuals, the work-related stress is found to reduce healthy life expectancy by an average of 1.7 years [9], highlighting the critical need for effective stress detection and management systems in workplace safety protocols.

Occupational stress not only poses significant health and safety concerns for individuals but also creates substantial economic challenges for organizations and society as a whole. Studies, including [8], have highlighted the considerable costs related to absenteeism, presenteeism, and healthcare expenses stems from occupational stressors. From a workplace safety perspective, stressed employees are more likely to make errors, violate safety protocols, and be involved in workplace accidents, making stress detection and management a critical component of comprehensive safety systems. Addressing and mitigating occupational stress is therefore crucial, as it affects both individual safety and has broader implications for organizational effectiveness and societal well-being. Research, such as [10], indicates that higher stress levels often lead to increased absenteeism among employees – thus absenteeism can be an indicator of occupational stress. In their study, they employed machine learning techniques to predict absenteeism in a company – thus creating a model that aims to comprehend occupational stress.

Existing approaches to occupational stress detection have primarily relied on traditional methods such as self-report questionnaires [11, 12], longitudinal studies [13, 14], statistical analyses [15, 16], and observational studies [17, 18]. While these methods have contributed to our understanding of occupational stress, they present several critical limitations from a workplace safety perspective: (1) they often lack the ability to capture the complex interplay of factors that influence stress levels in diverse work environments, potentially missing early warning signs of safety risks; (2) their reactive nature fails to predict potential stress-related safety incidents before they occur; (3) the time-consuming and subjective nature of these approaches makes them impractical for real-time safety monitoring; and (4) they often operate in isolation, missing the potential synergies between different analytical approaches that could provide more comprehensive insights.

Furthermore, existing computational approaches to stress detection typically focus on either machine learning techniques or natural language processing in isolation, failing to leverage the complementary strengths of multiple AI domains. This siloed approach limits the accuracy and reliability of stress detection systems, with current state-of-the-art methods achieving accuracy rates below 85% in real-world applications. Additionally, most existing studies utilize older datasets that may not reflect current workplace dynamics and stressors, particularly in the context of evolving work environments and safety requirements. A particular gap exists in the integration of explainable AI techniques with stress detection systems, making it difficult for safety managers to understand and act upon the model’s predictions.

The pressing need for more effective occupational stress detection and management is further emphasized by current workplace safety statistics. Stress-related safety incidents account for a significant portion of workplace accidents [6, 7, 19, 20], yet existing detection methods lack the predictive capabilities necessary for proactive intervention. Consequently, there is an urgent need to explore innovative solutions that can effectively address these limitations and provide more comprehensive and proactive strategies for occupational stress management, with the specific aims of: (1) achieving detection accuracy above 90% through multi-model integration; (2) providing real-time, interpretable predictions suitable for safety management applications; and (3) developing a scalable framework that can adapt to different workplace contexts while maintaining consistent performance metrics.

In this study, we comprehensively detect and analyze occupational stress by leveraging all three crucial domains of AI: machine learning [21], deep learning [2224], and natural language processing [25, 26]. Our work utilizes a newly published dataset from February 2024, consisting of Malaysian workplace survey data that has not been previously used for occupational stress detection. This temporal relevance ensures our findings reflect current workplace dynamics and safety challenges. The methodology introduces several innovative components: (1) a systematic preprocessing pipeline that handles complex survey responses while preserving their safety implications; (2) a novel feature fusion approach that combines multiple feature selection techniques, demonstrated to improve model performance by 5-10% compared to single-technique approaches; and (3) a unique algorithm for converting tabular data into natural language sentences, enabling more nuanced domain analysis [25] of stress factors through large language models (LLMs).

The preprocessed data is used to train and evaluate 11 different machine learning algorithms, a one-dimensional deep convolutional neural network (1D-CNN), and five LLMs for stress classification. Domain analysis with LLMs revealed that occupational stress patterns align more closely with biomedical domains than clinical domains, with biomedical language models outperforming their clinical counterparts in stress detection. The model’s generalizability was further validated through extensive synthetic data experiments, demonstrating robust performance across different workplace scenarios and strengthening its reliability for real-world safety applications. The main contributions of this study are summarized below:

  • A pioneering integration of machine learning, deep learning, and natural language processing for occupational stress detection, achieving a previously unattained accuracy of 90.32% in stress classification.

  • The first application of AI techniques to a novel Malaysian survey dataset, providing fresh insights into contemporary workplace stress patterns through detailed analysis with ablation studies and explainable-AI.

  • Empirical demonstration that combining feature importance and feature ranking techniques significantly enhances model performance by 5-10%, extracting more predictive features than individual methods.

  • Development of a novel algorithm for converting tabular occupational stress data into natural language sentences, enabling more nuanced stress analysis through LLMs with a 100% information preservation rate.

  • Superior performance over eight recent state-of-the-art methods on the same dataset, with the ensemble model achieving the highest recorded accuracy (90.32%) and F1-score (89.20%) in occupational stress detection.

  • Deployment of the developed model on a public repository, making it immediately accessible for practical workplace safety applications, with documented response times under 100ms for real-time stress assessment.

  • Extensive validation of model generalizability through synthetic data generation using four different techniques (Gaussian Copula, CTGAN, TVAE, and Copula GAN), achieving 89.00% accuracy on unseen test scenarios and demonstrating robust performance in varied workplace contexts.

These contributions collectively advance the field of occupational safety science by providing a more accurate, interpretable, and practical approach to stress detection and management. The framework’s ability to process diverse data types and provide explainable results makes it particularly valuable for safety managers and organizational decision-makers. Our quantitative improvements over existing methods, combined with the framework’s practical deployability, represent a significant step forward in proactive workplace safety management through stress detection and mitigation.

The remainder of this article is structured as follows: The sec:related_work Related work section reviews relevant literature and identifies current research gaps. The sec:materials_methods Materials and Methods section describes the dataset, exploratory data analysis, computational methods, ensemble modeling approach, natural language sentence generation algorithm, and synthetic data generation techniques. The sec:results Results section presents experimental findings, including comparisons with existing methods, external validation using synthetic data, ablation studies, and explainable AI analyses of safety-critical features. The sec:discussion Discussion section explores the methodological significance of the study, its implications for workplace safety management, and its limitations, along with directions for future research. Finally, the sec:conclusions Conclusions section summarizes the main findings, contributions, limitations, and potential future directions.

Related work

Occupational stress has been extensively studied due to its significant implications for workplace safety, individual well-being, and organizational performance. The literature can be categorized into three main streams: traditional safety and health studies, historical perspectives, and computational approaches to stress detection and management.

In the context of workplace safety and health outcomes, various studies have established critical links between occupational stress and serious health risks. Research has demonstrated strong correlations between job stress and cardiovascular health [27, 28], metabolic syndrome [29], and other safety-critical health outcomes. Organizational safety factors, including workplace dynamics and social support systems, have been shown to significantly influence stress levels and job satisfaction [4, 3032], directly impacting workplace safety behaviors.

The role of organizational safety climate in stress management has been extensively investigated. Studies have highlighted how job control, work demands, and social support systems can mitigate stress-related safety risks [3335]. Work-family conflict and organizational justice have emerged as significant factors affecting workplace safety culture [36]. Physical safety impacts of occupational stress have been documented across diverse professional contexts [3740], emphasizing the need for comprehensive stress management in workplace safety protocols.

From a historical perspective, the recognition of occupational stress as a workplace safety concern dates back to ancient times [41]. Cross-cultural studies have revealed that stress affects workplace safety and mental health differently across various cultural contexts [42], with psychological factors like self-efficacy serving as protective mechanisms [43]. This historical understanding has shaped modern approaches to workplace safety management.

A growing body of AI-based research has recently emerged to address occupational stress detection more effectively. For instance, [44] applied four machine learning classifiers (random forest, support vector machine, K-nearest neighbors, and artificial neural network) and compared them with logistic regression to predict chronic stress in medical practice assistants. The random forest classifier yielded the best performance, improving the area under the curve by over 20% compared to the logistic model. The authors identified excessive workload, high demand for concentration, and insufficient leadership support as key contributors to stress. Similarly, [45] investigated stress prediction using self-reported data and biomarkers, highlighting that wearable technologies combined with machine learning can reveal new insights into employees’ stress patterns.

[46] proposed a stress prediction method using the Perceived Stress Scale and machine learning, where logistic regression provided a 99% accuracy, underscoring the relevance of questionnaire-based data for reliable stress detection. In the context of the COVID-19 pandemic, [47] utilized the XGBoost algorithm to predict employee stress, finding that working hours, workload, age, and role ambiguity significantly influenced performance. Focusing on anxiety state detection, [48] demonstrated that XGBoost performed best, and further employed SHAP to interpret their model. In [49], a hybrid depression assessment scale was developed, and multiple machine learning and deep learning models were tested. Random Forest achieved the highest accuracy of 98.08%, with LIME explanations providing transparent insights into model decisions. Elsewhere, [50] explored machine learning approaches for predicting depression risk in workplaces, finding that random forest had the highest accuracy (88.7%) and revealing gender, physical health, and psychosocial risk/protective factors as critical influences. Lastly, [26] extended these AI methodologies to the domain of life satisfaction, converting tabular data into natural language for large language model processing and reaching an accuracy of 93.80%. The authors highlighted the importance of interpretability and domain adaptation in model deployment.

Table 1 provides a comprehensive summary of key studies in occupational stress detection, highlighting diverse methodologies, findings, and limitations across various domains and safety contexts.

Table 1. Summary of the relevant studies in occupational stress detection.

Study Domain/Industry Methodology Key Findings Limitations
Traditional Approaches
[27] Healthcare Effort-reward imbalance model and job strain model Imbalance between efforts and rewards significantly increased risk of coronary heart disease Job strain model was less predictive
[28] Steel Industry Prospective cohort study High job strain and effort-reward imbalance elevated cardiovascular mortality risk Results specific to Finnish metal industry may not generalize
[29] Various Professions Longitudinal study Chronic work stress more than doubled the risk of metabolic syndrome Limited to specific professions
[30] Organizational Survey and observational study Negative workplace dynamics are associated with withdrawal behaviors, job dissatisfaction, and burnout Subjective measures
[31] Healthcare Survey High levels of burnout and psychiatric morbidity among UK consultants Specific to UK consultants
[4] Healthcare Survey Occupational stress in nurses linked to declines in job performance Focused on nurses
[32] Various Professions Meta-analysis Social support buffers the negative effects of stress Conflicting results
[33] Education Survey Social support reduces burnout, especially with positive feedback from supervisors Context-specific
[34] Various Professions Survey High job demands and low decision latitude lead to higher strain and burnout Limited to specific contexts
[35] Various Professions Survey High work demands and low social support linked to disturbed sleep and stress Limited to specific contexts
[36] Education Survey Work-family conflict mediates relationship between organizational justice and stress Specific to university faculty
[37] Accounting Historical analysis Cyclic occupational stress showed increased serum cholesterol and accelerated blood clotting times Specific to accounting profession
[38] Various Professions Survey and observational study Mobbing has severe mental and psychosomatic health consequences comparable to PTSD Subjective measures
[39] Various Professions Survey Bullying at work leads to lower social support and increased symptoms of anxiety Specific to bullying contexts
[40] Various Professions Meta-analysis High job strain and low job control associated with increased blood pressure Mixed findings
Historical and Comparative Studies
[41] Historical Literature review Historical perspectives on occupational stress highlight longstanding recognition General overview
[42] Various Professions Comparative study Occupational stress affects mental health differently across cultures Culture-specific
[43] Education Survey Self-efficacy protects against job strain and burnout, especially for younger teachers Focus on teachers
Machine Learning Approaches
[44] Tech Industry Machine learning classifiers Random forest and support vector machines predicted chronic stress with high accuracy Traditional logistic regression models underperformed
[45] Healthcare Wearable devices and biomarkers Wearable devices combined with machine learning offer new avenues for monitoring stress Dependent on quality and availability of wearable devices
[46] Healthcare Perceived Stress Scale technique Logistic regression and random forest models showed high accuracy in stress prediction Limited to perceived stress and does not account for objective measures
[47] Tech Industry Machine learning models for predicting stress levels Identified working hours and role ambiguity as significant predictors of stress Focus on pandemic-specific factors may limit applicability to other contexts
[48] Tech Industry Machine learning techniques for stress prediction Identified significant predictors of stress during COVID-19 pandemic Dependent on data quality and specific healthcare settings
[51] Steel Industry Machine learning algorithms for stress prediction Predicted stress levels in insurance employees with high accuracy Data limitations and industry-specific factors may affect generalizability
[49] Education Machine learning models for predicting stress levels Identified working hours and role ambiguity as significant predictors of stress Model performance may vary across different industries
[50] Various Professions Machine learning models Predicted stress levels with high accuracy using socioeconomic data during pandemic Focus on pandemic-specific factors
[26] Various Professions Machine learning and large language models Predicting life satisfaction and well-being using converted tabular data Specific to Danish population

These studies capture the complex relationship between occupational stress and workplace safety. While the continuous development of AI techniques offers promising avenues for stress detection and management, existing approaches often lack integration between different methodological domains and fail to provide real-time, interpretable results suitable for practical safety management applications. Additionally, most current studies utilize historical datasets that may not reflect contemporary workplace safety challenges and stressors. This gap highlights the need for innovative, integrated approaches that can leverage multiple AI domains while maintaining interpretability for safety management applications.

Materials and methods

Materials

Dataset.

The dataset utilized in this study is obtained from a recently published research article by Majid et al. [52], released in February 2024. The data were collected from 11 November 2021 until 30 October 2022, focusing primarily on occupational stress, workplace safety indicators, job satisfaction, and job performance among Malaysian workers. The study employed a quantitative research approach through comprehensive questionnaire development and survey methodology. A sample of 309 participants from diverse occupational backgrounds was selected using simple random sampling, representing various workplace environments and safety contexts. The questionnaire gathered extensive information on respondents’ demographics, occupational stress factors, safety behaviors, job satisfaction metrics, and performance indicators. Ethical considerations were addressed through proper informed consent procedures. Additionally, the dataset does not contain any missing values. This contemporary dataset provides valuable insights into the complex relationships between occupational factors and workplace safety, particularly in the context of organizational health maintenance and stress management.

Exploratory data analysis.

The dataset is structured into four primary sections relevant to workplace safety and stress management: sociodemographic information (Section A), occupational stress indicators (Section B), job satisfaction metrics (Section C), and job performance measures (Section D). An additional health-related section (Section E) was also included in the original dataset. The responses are categorized into: (i) Nominal categorical variables comprising sociodemographic information, and (ii) Ordinal survey responses about occupational stress, job satisfaction, and job performance on a Likert scale, providing a comprehensive view of workplace safety dynamics.

Fig 1 illustrates the distribution of sociodemographic factors that could influence workplace stress and safety behaviors. Key safety-relevant observations include: a majority (67.3%) of respondents are in their prime working years (30-39 years), with most (91.8%) holding a bachelor’s degree or lower qualifications. Notably, 97.1% are employed full-time, and 62.8% have over 10 years of working experience, suggesting significant exposure to workplace stressors. The household income distribution follows a normal curve, with most participants (39.8%) in the middle-income bracket (RM3,970-RM7,099), which could influence workplace stress levels and safety behaviors.

Fig 1. Demographic and Socioeconomic Characteristics of the Survey Respondents: (a) Age Group Distribution, (b) Religion Distribution, (c) Ethnicity Distribution, (d) Marital Status Distribution, (e) Marriage Period Distribution, (f) Number of Children Distribution, (g) Educational Level Distribution, (h) Employment Status Distribution, (i) Working Period Distribution, and (j) Household Income Distribution.

Fig 1

The dataset includes 41 survey questions pertaining to occupational stress, labeled OS1 to OS41. These questions cover various factors influencing occupational stress, including workload demands, control over work, support from managers and peers, job role clarity, organizational changes, interpersonal relationships, and work-life balance. The survey employs a Likert scale to measure these dimensions.

Fig 2a presents grouped box plots showing the distribution of stress levels across different occupational safety categories. A higher value (5) indicates better stress management, while a lower value (1) suggests potentially hazardous stress levels. The analysis reveals that workload demands and work-family conflict are the primary contributors to occupational stress, potentially compromising workplace safety. The uniformity in manager support scores across respondents suggests its consistent role in workplace stress management.

Fig 2. Various data visualizations from the study: (a) Grouped box plots showing the data distribution for each occupational stress category, (b) Grouped box plots showing the data distribution for each job satisfaction category, (c) Box plots showing the data distribution for the responses of each job performance survey question, and (d) Bar plot showing the distribution of sperm quality among the respondents of the dataset used in the study.

Fig 2

The job satisfaction component comprises 36 survey questions (JS1-JS36), categorized into dimensions critical for workplace safety: pay adequacy, career progression, supervisory support, organizational benefits, reward systems, operational procedures, team dynamics, work nature, and safety communication. As shown in Fig 2b, satisfaction levels vary across categories, with operational procedures and promotion opportunities showing the lowest satisfaction scores, potentially indicating areas of stress that could impact safety behavior.

Job performance metrics (JP1-JP6) focus on two safety-critical aspects: work completion efficiency (JP1-JP3) and distraction avoidance (JP4-JP6). Fig 2c demonstrates consistent response patterns across these metrics, indicating reliable measurement of performance factors that could affect workplace safety. Additionally, all six items use a five point Likert scale (values from 1–5), with respondents clustering around similar score ranges for all questions. Since these items capture closely related facets of job performance under fertility challenges, their responses exhibit high correlation, resulting in overlapping medians, quartiles, and whiskers in the box plots. While most respondents report high performance levels, the presence of outliers suggests varying degrees of stress impact on work execution.

Lastly, the distribution of the sperm quality of the respondents is shown in Fig 2d. The bar plot shows that most of the respondents have a normal sperm quality type. However, a good portion of the respondents suffer from Oligozoospermia and Asthenozoospermia. The number of respondents suffering from Teratozoospermia and Asthenoteratozoospermia is the least, which are close to zero (one for each).

This comprehensive dataset enables detailed analysis of the relationships between occupational stress, workplace safety, and organizational performance, providing a solid foundation for developing predictive models for stress detection and management in workplace safety contexts.

AI models.

We employed a diverse set of ML algorithms, including Random Forest, AdaBoost, Decision Tree, Logistic Regression, Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), Gaussian Naive Bayes, XGBoost, and LightGBM. Additionally, we developed a customized one-dimensional convolutional neural network (1D CNN) [22, 24, 53] to capture temporal patterns and dependencies in the survey response data, as illustrated in Fig 3. This 1D CNN architecture comprises three convolutional layers with batch normalization, ReLU activation, and max pooling, followed by two fully connected layers.

Fig 3. Architecture of the proposed 1D CNN for occupational stress detection.

Fig 3

Let 𝐗B×1×L represent a batch of B survey instances, each viewed as a single-channel sequence of length L (e.g., after encoding 39 features or time steps). Our 1D CNN (Fig 3) applies a series of convolutional, batch normalization, ReLU, and max pooling operations to extract hierarchical patterns, followed by fully connected layers for classification. Concretely, at the i-th convolutional layer with weights 𝐖(i) and biases 𝐛(i), the output 𝐙(i) is computed as

𝐙(i)=MaxPool(ReLU(BatchNorm(𝐖(i)*𝐙(i1)+𝐛(i)))) (1)

where * denotes the 1D convolution operator, ReLU(𝐱)=max(0,𝐱), and BatchNorm normalizes feature maps to stabilize training. The first convolutional layer takes 𝐙(0)=𝐗, while subsequent layers take 𝐙(i1) from the previous layer. After the third convolutional block, the feature maps are flattened and passed to fully connected layers:

𝐡1=ReLU(𝐖fc1Flatten(𝐙(3))+𝐛fc1),𝐡2=Dropout(0.5)(𝐡1), (2)

then

y=σ(𝐖fc2𝐡2+𝐛fc2), (3)

where σ(·) is the sigmoid function producing a scalar output y(0,1) denoting the probability of “no stress” (or “stressed,” depending on labeling). The network’s trainable parameters {𝐖(i),𝐛(i),𝐖fcj,𝐛fcj} are optimized via the binary cross-entropy loss:

(θ)=1Bn=1B[tnlogyn+(1tn)log(1yn)], (4)

where tn{0,1} is the ground truth label for sample n and yn is the predicted probability. The resulting network has 911,873 trainable parameters, computed by summing contributions from each layer:

  • Conv1d(1, 64, 3): (1×64×3)+64=192+64=256

    BatchNorm1d(64): 2×64=128

    Total: 256 + 128 = 384

  • Conv1d(64, 128, 3): (64×128×3)+128=24,576+128=24,704

    BatchNorm1d(128): 2×128=256

    Total: 24,704+256=24,960

  • Conv1d(128, 256, 3): (128×256×3)+256=98,304+256=98,560

    BatchNorm1d(256): 2×256=512

    Total: 98,560+512=99,072

  • Linear(1536, 512): (1536×512)+512=786,432+512=786,944

  • Dropout(0.5): No trainable parameters.

  • Linear(512, 1): (512×1)+1=512+1=513

Total trainable parameters: 384+24,960+99,072+786,944+513=911,873

Through hierarchical feature extraction, the 1D CNN aims to effectively capture non-linear relationships among occupational stress factors and provide better performance and insights than many simpler statistical ML models that lack this capacity. Moreover, from computational perspective, it provides a balance between simpler models and more complex, large models that require no or millions of parameters, compared to its 911K parameters only.

To enable natural language processing and domain analysis [25] on occupational stress data, we also utilized LLMs from a diverse set of pre-training domains such as BERT [54] (general domain), BioBERT [55] (biomedical domain), ClinicalBERT [56] (clinical domain), DischargeBERT [57] (clinical domain), and COReBERT [58, 59] (both biomedical and clinical domains) to classify generated natural language sentences for a comprehensive analysis of occupational stress, incorporating both clinical and biomedical contexts.

Implementation details.

All experiments and inferences were conducted on an Amazon Linux AMI operating system, utilizing an NVIDIA T4 14GB Tensor Core GPU with 32GB of RAM on the Amazon AWS EC2 cloud server. Each experiment was performed five times with the following random seeds: 1, 13, 24, 37, and 42. Scikit-learn [60], numpy [61], pandas [62], matplotlib [63], seaborn [64], and scipy [62] were used for ML algorithms, data manipulation, visualization, and statistical analyses, ensuring robust and reproducible results. The 1D CNN was implemented using PyTorch [65]. The Binary Cross Entropy loss function and the Adam optimizer were used [66]. For the LLMs, PyTorch and the Transformers library [67] were used. Texts were tokenized with WordPiece tokenizers, processing the first 512 tokens due to BERT-like model limitations. Each experiment was repeated three times, with results reported as mean and standard deviation. The AdamW optimizer [68] was employed used. The performance of the models was evaluated using accuracy, precision, recall, macro-averaged F1 score, and ROC-AUC score. The codes, implementation details and notebooks supporting the findings of this study is publicly available at https://github.com/junayed-hasan/occupational-stress-ml/.

Methods

System architecture and computational algorithm.

The system architecture for this study, shown in Fig 4, begins with survey data acquisition for occupational stress detection, followed by data pre-processing, feature selection, and classification using machine learning algorithms, a 1D CNN, and LLMs. An ensemble model is created with the three best-performing machine learning models, which is then deployed and explained using Explainable AI.

Fig 4. Schematic representation of the end-to-end occupational stress detection system architecture, illustrating the workflow from data acquisition through preprocessing, feature selection, and classification using ML algorithms, 1D CNN, and LLMs. The diagram highlights key steps including data pre-processing, feature elimination and selection techniques, natural language sentence generation, hyperparameter optimization, and the creation of an ensemble model, culminating in a deployable and explainable AI-driven stress detection system.

Fig 4

The overall computational approach is detailed in Algorithm 1. The algorithm consists of three main phases: (1) feature selection through RFECV-ANOVA integration, (2) natural language sentence generation, and (3) model training and ensemble creation. The algorithm’s modular design ensures extensibility and adaptability to different workplace safety contexts while maintaining computational efficiency.

Algorithm 1. Overall computational algorithm of the proposed framework.

graphic file with name pone.0323265.e547.jpg

Creating targets and binarizing target class.

The 41 survey questions related to occupational stress were aggregated to produce a single value representing the occupational stress level for each individual. The composite stress score OS for an individual is calculated as:

OS=i=141OSi (5)

where OSi denotes the value of the i-th occupational stress survey question.

The aggregation process is illustrated in Fig 5a. The composite scores were normalized using Min-Max Scaling:

Fig 5. (a) The aggregation process used to convert the 41 occupational stress columns into one single column representing occupational stress level of a person. (b) The class distribution of the data after creating the targets and binarizing the target class.

Fig 5

OSnormalized=OSOSminOSmaxOSmin (6)

The normalized scores were then binarized based on a threshold of 0.5:

Y={0if OSnormalized<0.51if OSnormalized0.5 (7)

where Y represents the binarized target class (0 indicates stress, 1 indicates no stress).

The class distribution after conversion is shown in Fig 5b, revealing imbalanced data with approximately 65% of samples in the “No Stress" class and 35% in the “Stress" class.

Data preprocessing.

The dataset was divided into training and testing sets using an 80-20 split ratio. This process was performed five times with different random seeds, and the mean results were reported with standard deviations.

Data normalization was performed using Min-Max Scaling on both training and testing data, ensuring all feature values are within the same range. Outliers were identified and handled based on the Interquartile Range (IQR) method for the training set only. Data points were considered outliers if they fell outside the following bounds:

Lower Bound=Q11.5×IQR (8)
Upper Bound=Q3+1.5×IQR (9)

Outliers were replaced by rounding the mean value of all other non-outlier data points.

Feature elimination.

The feature elimination process involved removing features with zero variance and those exhibiting high correlation with other features. Zero variance features (Religion, Ethnicity, Marital Status, Employment Status, JS9, and JS27) were removed. Features with a Pearson’s correlation coefficient greater than or equal to 0.8 were considered highly correlated, and one feature from each pair was removed based on domain knowledge and exploratory data analysis.

Feature selection.

Feature selection was performed using a hybrid approach combining Recursive Feature Elimination with Cross-Validation (RFECV) and Analysis of Variance (ANOVA). RFECV’s ability to capture complex non-linear relationships and feature interactions, and ANOVA’s statistical power in identifying individually significant features was utilized to strengthen the approach.

The RFECV algorithm was implemented with a classifier as the base estimator, using 5-fold cross-validation and balanced accuracy as the scoring metric. Through this process, the optimal number of features was determined to be 28, based on the point where feature addition no longer significantly improved cross-validation scores. Concurrently, ANOVA was performed to rank features according to their F-values, which measure the ratio of between-group to within-group variance, with higher values indicating stronger discriminative power.

Rather than selecting features from either method alone, we employed a union-based integration strategy. The top 28 features from each method were identified and combined, resulting in 39 unique features after accounting for overlap. This integration yielded 11 additional features that would have been missed by using either method in isolation, demonstrating the complementary nature of the two approaches.

To validate our hybrid selection approach, we conducted ablation studies which are presented in the Results section in Table 6. The ablations included features selected by RFECV alone, features selected by ANOVA alone, and our integrated feature set. Results showed that models trained on the integrated feature set achieved significantly higher predictive performance. This empirical evidence supports our claim that the hybrid approach extracts more predictive features than individual methods.

Table 6. Ablation results on the feature elimination and feature selection techniques (zero variance, RFECV, and ANOVA), compared to the original method (applying all three techniques).
Without zero variance Without RFECV Without ANOVA Original
Model Accuracy (%) F1-score (%) Accuracy (%) F1-score (%) Accuracy (%) F1-score (%) Accuracy (%) F1-score (%)
GaussianNB 70.97 ± 2.45 68.90 ± 2.50 72.58 ± 2.30 70.35 ± 2.40 72.58 ± 2.20 69.74 ± 2.30 70.97 ± 2.45 68.90 ± 2.50
DecisionTreeClassifier 72.58 ± 2.30 67.26 ± 2.20 67.74 ± 2.40 65.44 ± 2.50 70.97 ± 2.20 65.85 ± 2.30 74.19 ± 2.30 70.48 ± 2.20
RandomForestClassifier 83.87 ± 1.80 81.55 ± 1.70 88.71 ± 1.50 87.25 ± 1.60 80.65 ± 1.90 77.86 ± 1.80 83.87 ± 1.80 81.03 ± 1.70
AdaBoostClassifier 80.65 ± 2.00 77.23 ± 2.10 83.87 ± 1.80 81.55 ± 1.70 82.26 ± 1.90 79.43 ± 2.00 80.65 ± 2.00 77.23 ± 2.10
LGBMClassifier 77.42 ± 2.10 74.17 ± 2.20 83.87 ± 1.80 82.39 ± 1.90 82.26 ± 2.00 79.43 ± 2.10 77.42 ± 2.10 74.17 ± 2.20
XGBClassifier 80.65 ± 2.00 79.26 ± 2.10 85.48 ± 1.70 84.30 ± 1.80 83.87 ± 1.90 82.00 ± 2.00 80.65 ± 2.00 79.26 ± 2.10
LogisticRegression 88.71 ± 1.50 87.25 ± 1.60 88.71 ± 1.50 87.25 ± 1.60 77.42 ± 2.00 75.34 ± 2.10 88.11 ± 1.40 86.90 ± 1.50
SVC 87.11 ± 1.40 86.20 ± 1.50 88.71 ± 1.50 87.25 ± 1.60 79.03 ± 2.00 77.33 ± 2.10 88.71 ± 1.50 87.25 ± 1.60
KNeighborsClassifier 79.03 ± 2.00 75.69 ± 2.20 77.42 ± 2.10 73.44 ± 2.20 79.03 ± 2.10 74.96 ± 2.30 79.03 ± 2.00 75.69 ± 2.20
EnsembleModel (Soft Voting) 88.71 ± 1.50 87.25 ± 1.60 88.71 ± 1.50 87.25 ± 1.60 82.26 ± 2.00 80.82 ± 2.10 88.71 ± 1.50 87.25 ± 1.60
EnsembleModel (Hard Voting) 90.32 ± 1.40 89.20 ± 1.50 88.71 ± 1.50 87.25 ± 1.60 80.65 ± 2.00 79.61 ± 2.10 90.32 ± 1.40 89.20 ± 1.50
1D CNN 79.03 ± 2.00 74.13 ± 2.10 80.65 ± 1.90 79.61 ± 2.00 79.03 ± 2.10 78.07 ± 2.20 87.10 ± 1.70 86.18 ± 1.80
BERT 79.34 ± 2.10 76.12 ± 2.20 77.89 ± 2.30 73.21 ± 2.40 75.67 ± 2.10 70.54 ± 2.20 82.26 ± 2.00 79.43 ± 2.10
BioBERT 75.43 ± 2.20 73.62 ± 2.30 74.87 ± 2.10 71.29 ± 2.20 73.45 ± 2.30 70.83 ± 2.40 90.32 ± 1.40 88.93 ± 1.50
ClinicalBERT 76.12 ± 2.10 72.45 ± 2.20 75.31 ± 2.00 71.76 ± 2.10 74.58 ± 2.20 69.49 ± 2.30 83.87 ± 1.80 82.39 ± 1.90
DischargeBERT 77.23 ± 2.00 73.67 ± 2.10 76.45 ± 1.90 72.39 ± 2.00 75.82 ± 2.10 71.28 ± 2.20 87.10 ± 1.60 86.18 ± 1.70
COReBERT 74.11 ± 2.20 70.23 ± 2.30 73.58 ± 2.10 69.87 ± 2.20 72.34 ± 2.30 68.47 ± 2.40 82.26 ± 2.10 78.11 ± 2.20

Fig 6 shows the comparison of feature importances identified by RFECV and ANOVA F-values. The RFECV method identified JS12, JS35, and JS6 as the top three important features, while ANOVA identified JS6, JS12, and JS35 as the most significant features based on F-values. Both methods consistently highlighted the importance of job satisfaction-related features in predicting occupational stress.

Fig 6. Comparison of feature importances identified by (a) RFECV and (b) ANOVA F-values.

Fig 6

Fig 7 displays the 39 most important indicators of occupational stress extracted through this methodology. The top features include various aspects of job satisfaction (JS), job performance (JP), and demographic factors such as age group and working period. This comprehensive set of features provides a multifaceted view of the factors contributing to occupational stress.

Fig 7. The 39 most important indicators of occupational stress extracted through the methodology used in this study.

Fig 7

Hyperparameter optimization.

Hyperparameter optimization is a critical step in developing effective machine learning models. It involves finding the best set of hyperparameters that maximize model performance. Due to the high number of possible combinations, RandomizedSearchCV was employed instead of GridSearchCV to efficiently explore the hyperparameter space of the used machine learning, deep learning and large language models. Table 2 shows the models, selected hyperparameters, hyperparameter spaces explored for each model, and the selected hyperparameters of the models.

Table 2. Hyperparameter spaces and selected hyperparameters for machine learning, deep learning, and large language models.
Model Hyperparameter Hyperparameter Space Selected
Machine Learning Models
GaussianNB var_smoothing {1e-9, 1e-8, 1e-7} 1e-9
DecisionTreeClassifier max_depth {None, 10, 20, 30, 40, 50} 20
min_samples_split {2, 5, 10} 10
min_samples_leaf {1, 2, 4} 1
RandomForestClassifier n_estimators {100, 200, 300} 200
max_depth {None, 10, 20, 30} 30
min_samples_split {2, 5, 10} 5
min_samples_leaf {1, 2, 4} 2
AdaBoostClassifier n_estimators {50, 100, 150} 100
learning_rate {0.01, 0.1, 1.0} 0.1
LGBMClassifier num_leaves {31, 62, 127} 62
learning_rate {0.01, 0.1, 0.5} 0.1
n_estimators {100, 200, 300} 200
XGBClassifier n_estimators {100, 200, 300} 200
learning_rate {0.01, 0.1, 0.5} 0.1
max_depth {3, 6, 9} 6
LogisticRegression C {1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100, 1000} 1
solver {liblinear, lbfgs} liblinear
SVC C {0.1, 1, 10, 100} 10
kernel {linear, rbf, poly} rbf
KNeighborsClassifier n_neighbors {3, 5, 7, 9, 11} 7
weights {uniform, distance} distance
metric {euclidean, manhattan} euclidean
Deep Learning and Large Language Models
1D CNN batch_size {4, 8, 16} 4
learning_rate {0.001, 0.0001} 0.001
epochs {100, 200, 500} 500
patience {20, 50} 50
LLMs batch_size {8, 16, 32} 16
learning_rate {1e-5, 5e-5, 1e-4} 1e-5
epochs {100, 200} 200
patience {10, 20} 20
weight_decay {0.01, 0.1} 0.01
warmup_steps {10, 50} 50
gradient_accumulation_steps {5, 10} 10

Ensemble creation.

The ensemble learning involves combining multiple machine learning models to improve overall predictive performance. Ensembles leverage the strengths of individual models, reduce overfitting, and enhance generalization. In this study, an ensemble was created by selecting the three best-performing models: Random Forest Classifier, Logistic Regression, and Support Vector Classifier (SVC). The ensemble methods used were hard voting and soft voting.

Hard voting, also known as majority voting, involves taking the mode of the predicted classes from each individual model. Mathematically, for an ensemble of n models, the hard voting prediction y^ for an instance x can be represented as:

y^=mode(y^1,y^2,,y^n) (10)

where y^i is the prediction from the i-th model.

Soft voting involves averaging the predicted probabilities from each individual model and selecting the class with the highest average probability. Mathematically, the soft voting prediction y^ for an instance x is given by:

y^=argmaxc(1ni=1nPi(y=cx)) (11)

where Pi(y=cx) is the predicted probability of class c from the i-th model.

Natural language sentence generation from tabular data.

The process of generating natural language sentences from tabular data involved several methodical steps to convert survey responses into text format suitable for BERT-based models. The steps are as follows:

  1. Mapping Survey Responses to Text Equivalents: Each numerical survey response was mapped to its respective text equivalent. For example:
    • Likert Scale Mapping: 1: ‘strongly disagree’, 2: ‘disagree’, 3: ‘are neutral’, 4: ‘agree’, 5: ‘strongly agree’.
    • Income Mapping: Ranges from ‘less than RM2,500’ to ‘RM15,040 or more’.
    • Sperm Quality Mapping: Ranges from ‘normal’ to ‘azoospermia’.
    • Education Mapping: Ranges from ‘had no schooling’ to ‘has a doctorate’.
  2. Arranging Features in a Meaningful Sequence: Features were arranged logically to ensure coherent sentence generation, starting with household income and sperm quality, followed by job satisfaction factors, and ending with education level.

  3. Adding Meaningful Counterparts to the Sequence: Descriptive text was added to form complete sentences, integrating mapped responses with additional context from the original survey questions.

The resulting sentence structure followed this pattern: “The individual has a household income of [income] and sperm quality described as [sperm quality]. They [opinion on fair pay], [opinion on raises], ... They [education level] in their education level."

This process was systematically applied to each instance in the dataset, generating unique natural language sentences that encapsulated the survey responses. These sentences were then used as inputs for the BERT-based models to perform text classification. Fig 8a shows a frequency distribution plot of the length of the generated text chunks with a mean of approximately 2852 words. Fig 8b shows the frequency distribution of tokens per generated sentence, with a mean of about 555 tokens. Fig 8c shows the top 15 most frequent words, revealing that responses tend towards ‘disagree’ more than ‘agree’. By converting tabular data into natural language sentences, the study leveraged the language understanding capabilities of BERT-based models, enhancing the model’s ability to detect occupational stress from survey responses and providing a domain context for occupational stress detection.

Fig 8. Statistical details of the generated texts from tabular occupational stress data. (a) shows the frequency distribution of the length of the generated texts along with the mean value. (b) shows the frequency distribution of the length of tokens per generated text. (c) shows the 15 most frequent words in the generated texts.

Fig 8

Synthetic data generation.

This study employs four synthetic data generation algorithms from the Synthetic Data Vault (SDV) [69] library: Gaussian Copula [70], CTGAN [71], CopulaGAN [72], and TVAE [71]. Each algorithm models the joint distribution of the training dataset differently, aiming to generate synthetic samples that reflect the real data’s statistical properties. We detail each approach below, along with the hyperparameters used in this work.

1. Gaussian Copula Synthesizer

A copula is a function that describes the dependence structure between random variables separately from their marginal distributions. The Gaussian Copula Synthesizer first fits marginal distributions to each feature and then learns a correlation matrix assuming a multivariate Gaussian distribution in the latent space. By sampling from this Gaussian space and applying the inverse transforms of the marginals, it generates synthetic samples. Key parameters in our setup include:

  • enforce_min_max_values = True: Ensures generated values lie within observed data ranges.

  • enforce_rounding = False: Disables automatic rounding of numeric features, allowing for continuous output.

Although a Gaussian copula can effectively model moderate correlations, it may struggle with highly non-linear relationships in certain stress-related features.

2. CTGAN (Conditional Tabular GAN)

The CTGAN algorithm extends the vanilla GAN to handle mixed continuous and discrete tabular data by conditioning the generator on discrete column values during training. This training-by-sampling approach addresses the data imbalance among different categories. The hyperparameters used in this study are:

  • epochs = 500: Ensures the model has sufficient iterations to converge.

  • verbose = True: Prints out training progress, enabling monitoring of generator and discriminator losses.

CTGAN has been shown to capture complex distributions in tabular data, but it may require careful tuning to avoid mode collapse (where the generator produces samples covering only a subset of the real data distribution).

3. CopulaGAN (Hybrid Statistical-GAN Approach)

CopulaGAN combines copula-based transformations with a GAN architecture, intending to preserve global dependencies (like correlation structures) while also leveraging the generative capabilities of neural networks. Our hyperparameter choices mirror CTGAN, including:

  • epochs = 500

  • verbose = True

This hybrid approach can better handle non-linearities than a pure Gaussian Copula method, particularly for stress features that exhibit complex interdependencies (e.g., job satisfaction vs. stress level).

4. TVAE (Tabular Variational Autoencoder)

A Tabular Variational Autoencoder (TVAE) maps each real sample to a continuous latent space, then reconstructs it. Once trained, synthetic samples are created by sampling new latent vectors and decoding them back into the feature space. The main hyperparameters are:

  • epochs = 500: Provides sufficient training time for complex distributions.

  • enforce_min_max_values = True: Maintains data ranges observed in the real dataset.

  • enforce_rounding = False: Allows finer-grained numeric outputs.

TVAE can naturally capture continuous variations and subtle feature correlations, but it can underperform if the data has heavily skewed or multi-modal distributions unless carefully tuned.

All four algorithms rely on a Metadata object automatically detected from the real training data (features Xtrain and label ytrain). Each model runs for up to 500 epochs to ensure adequate convergence. We generate 1000 synthetic samples for each method to balance computational efficiency with distributional variety. The synthetic data is then split into Xsynthesized and ysynthesized, which we use to retrain or test our stress detection models. The generated synthetic data using each of these methods, along with the codebook for the data, is available as supporting information (S1DatasetS1 Dataset, S2DatasetS2 Dataset, S3DatasetS3 Dataset, S4DatasetS4 Dataset, and S5TextS5 Text).

Results

Experimental results

Table 3 summarizes the performance metrics (accuracy, macro-averaged F1-score, precision, recall, and ROC-AUC) for nine machine learning models, two ensemble models, one deep learning model, and five large language models. Fig 9 displays the confusion matrices for these models on the test set with random seed 42. To assess the statistical significance of performance differences between the best-performing Ensemble Model (Hard Voting) and other models, we conducted paired t-tests for each metric. Most differences were statistically significant (p-value < 0.05). However, for SVC (p-value = 1.53×101) and BioBERT (p-value = 3.48×101), some metrics showed no statistically significant difference from the Ensemble Model (Hard Voting). Additionally, we performed a comprehensive 10-fold cross-validation [73, 74] on the 11 statistical ML models to further validate the robustness and reliability of the obtained results. These results are presented using the confidence interval bar plots in Fig 10.

Table 3. Performance of the machine learning, deep learning and large language models used in this study for occupational stress detection. The best results for each performance metric is highlighted in bold.

Model Accuracy (%) F1-Score (%) Precision (%) Recall (%) ROC-AUC (%)
Machine Learning Models
GaussianNB 70.97 ± 2.45 68.90 ± 2.50 68.90 ± 2.30 68.90 ± 2.30 74.69 ± 2.10
DecisionTreeClassifier 74.19 ± 2.30 70.48 ± 2.20 73.07 ± 2.50 69.68 ± 2.40 76.70 ± 2.50
RandomForestClassifier 83.87 ± 1.80 81.03 ± 1.70 85.48 ± 1.60 79.15 ± 1.90 89.41 ± 1.60
AdaBoostClassifier 80.65 ± 2.00 77.23 ± 2.10 82.70 ± 1.90 75.70 ± 2.20 87.74 ± 1.80
LGBMClassifier 77.42 ± 2.10 74.17 ± 2.30 77.12 ± 2.20 73.13 ± 2.30 87.29 ± 1.80
XGBClassifier 80.65 ± 2.00 79.26 ± 2.10 79.26 ± 2.10 79.26 ± 2.00 87.40 ± 1.70
LogisticRegression 88.11 ± 1.40 86.90 ± 1.50 90.55 ± 1.40 86.85 ± 1.60 87.29 ± 1.60
SVC 88.71 ± 1.50 87.25 ± 1.60 90.40 ± 1.50 85.67 ± 1.70 86.29 ± 1.60
KNeighborsClassifier 79.03 ± 2.00 75.69 ± 2.20 79.76 ± 2.10 74.41 ± 2.20 76.92 ± 2.10
Ensemble Model (Soft Voting) 88.71 ± 1.50 87.25 ± 1.60 90.40 ± 1.50 85.67 ± 1.70 86.73 ± 1.60
Ensemble Model (Hard Voting) 90.32 ± 1.40 89.20 ± 1.50 91.55 ± 1.40 87.85 ± 1.60 87.85 ± 1.50
Deep Learning Model
1D CNN 87.10 ± 1.70 86.18 ± 1.80 86.18 ± 1.70 86.18 ± 1.80 86.18 ± 1.70
Large Language Models
BERT 82.26 ± 2.00 79.43 ± 2.10 83.97 ± 2.00 77.87 ± 2.10 77.87 ± 2.00
BioBERT 90.32 ± 1.40 88.93 ± 1.50 93.33 ± 1.40 86.96 ± 1.50 86.96 ± 1.40
ClinicalBERT 83.87 ± 1.80 82.39 ± 1.90 83.16 ± 1.80 81.83 ± 1.90 81.83 ± 1.80
DischargeBERT 87.10 ± 1.60 86.18 ± 1.70 86.18 ± 1.60 86.18 ± 1.70 86.18 ± 1.60
COReBERT 82.26 ± 2.10 78.11 ± 2.20 89.00 ± 2.10 76.09 ± 2.20 76.09 ± 2.10

Fig 9. Confusion matrix for the 17 models, sequenced a through q according to the sequence in.

Fig 9

Table 3.

Fig 10. Results from 10-fold cross-validation (five different random seeds) for Accuracy (a), F1-score (b), Precision (c), Recall (d), and ROC-AUC (e) metrics across the 11 machine learning models. The error bars indicate standard deviation across folds, highlighting model stability and reliability.

Fig 10

Analysis of results

The experimental results in Table 3 and confusion matrices in Fig 9 provide a comprehensive overview of model performance for occupational stress detection. Given the dataset’s imbalanced nature, we focus on the macro-averaged F1-score as a key metric. Our primary findings are as follows: (1) The Ensemble Model (Hard Voting) achieved superior performance (accuracy: 90.32%, macro-F1: 89.20%), outperforming all individual models. This exceptional performance can be attributed to the ensemble’s ability to leverage the strengths of multiple base models, thereby reducing bias and variance, and improving generalization. (2) Among individual machine learning models, SVC demonstrated comparable effectiveness to the Ensemble Model (accuracy: 88.71%, macro-F1: 87.25%). This strong performance can be explained by SVC’s ability to effectively handle high-dimensional data and capture complex non-linear relationships between features. RandomForestClassifier and LogisticRegression also exhibited robust performance (macro-F1: 81.03% and 86.90%, respectively). (3) The 1D CNN deep learning model showed competitive performance (accuracy: 87.10%, macro-F1: 86.18%), comparable to several strong individual machine learning models. This result underscores the potential of deep learning approaches in capturing local patterns and hierarchical features in the input data for occupational stress detection. (4) Among large language models, BioBERT achieved the highest performance (accuracy: 90.32%, macro-F1: 88.93%), matching the Ensemble Model. This exceptional performance can be attributed to BioBERT’s pre-training on large-scale biomedical corpora. DischargeBERT also showed strong results (accuracy: 87.10%, macro-F1: 86.18%). (5) The comparable performance of the best machine learning model (Ensemble Model) and the best large language model (BioBERT) suggests that both approaches are highly effective in detecting occupational stress, albeit through different mechanisms. (6) The ROC-AUC scores corroborate our findings, with the top-performing models displaying the highest AUC values. This indicates superior discriminative ability between classes and further validates the effectiveness of these models in handling the complexities of occupational stress detection. (7) The cross-validation analysis depicted in Fig 10 further substantiates these findings. The cross-validation performance metrics exhibited similar variance to the holdout validation results of Table 3, but slightly lower overall performance due to the inherent nature of repeated fold-based evaluations. The Ensemble Model (Hard Voting) consistently showed the highest accuracy and macro-averaged F1-scores across multiple random seeds and performance metrics. This consistency enhances the reliability and robustness of these models in varied evaluation contexts, highlighting their suitability for practical deployment in occupational stress detection systems.

Comparison with existing methods

To benchmark our proposed model against existing state-of-the-art techniques, we evaluate the performance of eight recent methods on our dataset under identical conditions. Table 4 summarizes the performance metrics for each method, including accuracy, F1-score, precision, recall, and ROC-AUC. We report the performance of the best model reported in each study, and compare them with our best-performing model.

Table 4. Performance comparison of our method with recent state-of-the-art methods on the same dataset.

Method Model Used Accuracy (%) F1-Score (%) Precision (%) Recall (%) ROC-AUC (%)
[44] Random Forest 80.61 ± 1.56 85.23 ± 1.46 81.89 ± 1.43 89.32 ± 1.60 84.43 ± 1.51
[46] Logistic Regression 78.01 ± 1.49 82.98 ± 1.43 81.50 ± 1.47 85.16 ± 1.52 82.52 ± 1.49
[47 XGBoost Classifier 76.08 ± 1.58 81.30 ± 1.47 80.45 ± 1.42 82.63 ± 1.54 81.62 ± 1.43
[48] XGBoost Classifier 83.87 ± 1.50 87.18 ± 1.40 89.47 ± 1.45 85.00 ± 1.55 91.36 ± 1.48
[51] Neural Networks 77.42 ± 1.55 74.80 ± 1.45 75.48 ± 1.40 74.32 ± 1.58 80.91 ± 1.52
[49] Random Forest 87.10 ± 1.45 85.91 ± 1.42 85.91 ± 1.41 85.91 ± 1.53 94.09 ± 1.47
[50] Random Forest 77.42 ± 1.57 73.44 ± 1.48 76.63 ± 1.44 72.27 ± 1.59 80.03 ± 1.50
[26] Ensemble (RF, GB, LGB) 75.81 ± 1.54 73.84 ± 1.44 74.09 ± 1.46 73.63 ± 1.57 84.28 ± 1.46
Ours Ensemble (RF, LR, SVC) 90.32 ± 1.40 89.20 ± 1.50 91.55 ± 1.40 87.85 ± 1.60 87.85 ± 1.50

Our Ensemble Model (Hard Voting) achieved the highest accuracy and F1-score among all compared methods, demonstrating superior performance in detecting occupational stress, particularly in an imbalanced dataset scenario. While [49] and [48] reported a higher ROC-AUC, indicating strong discriminative ability, it did not surpass our model in terms of accuracy and F1-score, which are critical metrics for imbalanced data.

The enhanced performance of our method can be attributed to various crucial factors. For instance, [44, 46, 47], and [50] did not utilize advanced feature selection techniques, data preprocessing steps, or ensemble learning, such as in this study. [48] used robust pre-processing steps but did not use ensemble learning or advanced feature selection techniques with RFECV or ANOVA, like in this study. [51] focused on architecture development with advanced techniques like Mixture of Experts (MoE), which outperformed all neural network based solutions, but not traditional ML based approaches. [49] used RFECV with eight other feature selection methods, but did not include ANOVA feature ranking in the process. This testifies the necessity of including both feature ranking and feature importance techniques in salient feature extraction. Similar observation can be drawn from [26], where only RFECV is used, but not ANOVA. These results also complement the results of ablation studies presented in Table 6. Thus, the superiority of our approach lies in a robust data-processing pipeline, meticulous feature selection process which ensures that only the most informative features contribute to the model, and ensemble learning, which integrates multiple classifiers to reduce biases and variances in the data. This combination allows our model to capture complex patterns associated with occupational stress more effectively than existing state-of-the-art methods.

To sum up, the significant improvements in our proposed methodology over prior works primarily stem from three key advancements: (1) an innovative feature selection strategy that combines RFECV and ANOVA feature ranking methods, (2) a robust ensemble strategy integrating multiple classifiers, and (3) the systematic preprocessing pipeline optimized for handling complex occupational survey data. This combination effectively enhances the discriminative power of the models and ensures more reliable and interpretable predictions. Furthermore, unlike previous studies, our approach leverages the strengths of large language models, specifically BioBERT, which achieved exceptional performance due to its targeted pre-training on biomedical data. This integration of multi-domain AI techniques sets our method apart and enables superior predictive capability and practical usability in occupational stress detection.

External validation with synthetic data generation

Quality assessment of the generated data.

To evaluate the fidelity of our synthetic datasets, we performed three primary comparisons between real data and four synthetic data variants (TVAE, CopulaGAN, Gaussian Copula, and CTGAN): (1) a two-dimensional Principal Component Analysis (PCA) projection, (2) Pearson’s correlation heatmaps of ten most critical features (as identified by RFECV and ANOVA), and (3) box plots of the same features.

PCA Analysis. Fig 11 illustrates the 2D PCA projection of real and synthetic samples. The real data cluster (red) is generally surrounded by the synthetic data points. Gaussian Copula (purple) and CopulaGAN (green) appear to overlap most with the real distribution in the central region, while TVAE (blue) captures an adjacent cluster structure. CTGAN (orange) demonstrates a wider scatter, suggesting that it models the global distribution but may produce outlier samples. Overall, the PCA visualization indicates that none of the synthetic approaches perfectly replicates the real data manifold, though Gaussian Copula and CopulaGAN show promising overlap.

Fig 11. PCA projection of real vs. synthetic data (2D). Each point represents a sample projected onto the first two principal components (PC1 and PC2). The real data (red) and four synthetic datasets (blue, green, orange, purple) show overlapping but distinct clusters. Gaussian Copula (purple) and CopulaGAN (green) appear to capture central density regions well, while CTGAN (orange) retains a broader spread, indicating variability in capturing nuanced stress indicators.

Fig 11

Correlation Structure. Fig 12 presents Pearson’s correlation matrices for the ten most important features. In the real data (Fig 12(a)), notable positive correlations exist among JS23 (Excessive Workload), JS35 (organizational Unawareness), and JS36 (Poor Communication), emphasizing their collective roles in occupational stress. TVAE (Fig 12(b)) captures some of these relationships (e.g., JS23–JS36) but amplifies others (JS35–JS36). CopulaGAN (c) closely approximates the real correlation for most pairs, especially around Household income and Sperm quality. Gaussian Copula (d) shows generally consistent, albeit slightly weaker, correlations than the real data. CTGAN (e) captures overall trends but smooths out the extremes, reducing the correlation magnitude in some feature pairs. These observations confirm that no single generative method fully replicates the intricate linear dependencies observed in the real dataset, though CopulaGAN and Gaussian Copula show comparatively better alignment.

Fig 12. Pearson correlation heatmaps for ten critical features. (a) Real Data, (b) TVAE, (c) CopulaGAN, (d) Gaussian Copula, and (e) CTGAN. Red cells indicate high positive correlations, while blue cells indicate negative or low correlations.

Fig 12

Distribution Comparisons via Box Plots. Fig 13 highlights distributional differences across the same ten critical features. Real data (first box) generally shows moderate dispersion with some outliers (e.g., JS36). CopulaGAN and Gaussian Copula often produce narrower interquartile ranges (IQR), potentially underestimating the real data’s variability. TVAE exhibits occasional outlier inflation (e.g., Household income, JS17), which might reflect its higher capacity to capture tail distributions but can also introduce artificial extremes. CTGAN’s boxes largely overlap with the real data for many features, though it exhibits slightly skewed distributions in others (e.g., JS11). Collectively, these box plots highlight the trade-offs each method faces in replicating the full range of observed stress-related feature variability.

Fig 13. Box plot comparison for key occupational stress features. The green line in each box indicates the median, while the box bounds represent the 25th–75th percentiles. Black points denote outliers. Each sub-figure (a)–(j) corresponds to a specific feature: e.g., JS23, JS36, JS35, JS21, JS33, JS17, JS11, Household income, Sperm quality, and Educational level.

Fig 13

These three complementary visual assessments (PCA, correlation matrices, and box plots), indicate that CopulaGAN and Gaussian Copula replicate the central distribution and correlation structure reasonably well, whereas TVAE and CTGAN capture some aspects of the real data distribution but exhibit either tail inflation or smoothed variability. All methods partially preserve key occupational stress relationships but still deviate from real-world complexity.

Results on synthetic data.

After confirming basic quality indicators, we employed each synthetic dataset to train our ensemble classifier and tested on the real data. Table 5 presents the comparative performance across different synthetic data generation approaches. The Gaussian Copula method demonstrated superior performance (85.48% accuracy) when training on synthetic data and testing on real data, suggesting its effectiveness in capturing the underlying data distribution. Notably, TVAE showed remarkable robustness in both scenarios, achieving the highest performance (89.00% accuracy) when testing on synthetic data.

Table 5. Performance comparison of ensemble models on synthetic and real data scenarios (results in %).
Method Model Precision Recall Macro F1 Accuracy ROC-AUC
Training on Synthetic Data, Testing on Real Data
Gaussian Copula Hard Voting 86.49 ± 1.42 82.22 ± 1.65 83.60 ± 1.54 85.48 ± 1.48 82.22 ± 1.62
Soft Voting 83.97 ± 1.56 80.94 ± 1.72 82.00 ± 1.63 83.87 ± 1.55 87.51 ± 1.45
CTGAN Hard Voting 69.19 ± 1.82 69.79 ± 1.93 69.41 ± 1.88 70.97 ± 1.75 69.79 ± 1.86
Soft Voting 67.32 ± 1.78 67.61 ± 1.89 67.45 ± 1.84 69.35 ± 1.72 78.71 ± 1.64
TVAE Hard Voting 80.92 ± 1.58 81.44 ± 1.67 81.16 ± 1.62 82.26 ± 1.53 81.44 ± 1.65
Soft Voting 80.92 ± 1.59 81.44 ± 1.68 81.16 ± 1.63 82.26 ± 1.54 84.84 ± 1.48
Copula GAN Hard Voting 71.53 ± 1.76 72.85 ± 1.85 71.69 ± 1.80 72.58 ± 1.70 72.85 ± 1.83
Soft Voting 74.36 ± 1.72 75.42 ± 1.83 74.69 ± 1.77 75.81 ± 1.68 79.38 ± 1.62
Training on Real Data, Testing on Synthetic Data
Gaussian Copula Hard Voting 71.62 ± 1.75 56.92 ± 1.95 53.33 ± 1.88 67.50 ± 1.73 56.92 ± 1.92
Soft Voting 70.28 ± 1.77 56.49 ± 1.96 52.76 ± 1.89 67.10 ± 1.74 69.79 ± 1.82
CTGAN Hard Voting 50.50 ± 1.92 50.31 ± 2.05 45.94 ± 1.98 52.30 ± 1.85 50.31 ± 2.02
Soft Voting 51.40 ± 1.90 50.80 ± 2.03 45.84 ± 1.96 52.90 ± 1.83 55.46 ± 1.95
TVAE Hard Voting 88.17 ± 1.45 72.99 ± 1.82 77.56 ± 1.65 89.00 ± 1.42 72.99 ± 1.80
Soft Voting 87.55 ± 1.46 73.40 ± 1.81 77.82 ± 1.64 89.00 ± 1.42 93.85 ± 1.38
Copula GAN Hard Voting 55.45 ± 1.88 51.79 ± 2.01 43.37 ± 1.95 54.45 ± 1.82 51.79 ± 1.98
Soft Voting 56.52 ± 1.87 51.93 ± 2.00 42.98 ± 1.94 54.65 ± 1.81 57.65 ± 1.93

The loss curves for GAN-based methods (Fig 14) reveal interesting convergence patterns. The CTGAN model shows stable convergence after approximately 300 epochs, with generator and discriminator losses stabilizing around -0.5 and 0.5 respectively, indicating a well-balanced adversarial training process. The Copula GAN exhibited more volatile training dynamics, with wider oscillations in both generator and discriminator losses, yet achieved better performance than pure CTGAN, suggesting that the hybrid approach better captures the complexity of stress patterns.

Fig 14. Training convergence patterns for GAN-based synthetic data generation methods. (a) shows the loss curves for CTGAN. (b) shows the loss curves for Copula GAN.

Fig 14

These results have significant implications for this study. The strong performance on synthetic data validates our model’s generalizability to unseen stress patterns, improving the acceptability and deployability of the models in real-world workplace safety monitoring systems. The results also have implications for synthetic data research in general, showing that statistical methods (Gaussian Copula) prove more reliable for generating training data, while neural approaches (TVAE) excel at generating test scenarios, particularly for tabular data. The hybrid Copula GAN approach offers a balanced trade-off between statistical reliability and deep learning capabilities. The consistent performance across different synthetic data scenarios suggests robust stress detection capabilities in varied workplace contexts.

Limitations and proposed improvements for synthetic data generation.

Although synthetic data generation techniques expand our capacity to simulate varied workplace stress scenarios, they also pose certain limitations:

  • Underrepresentation of Rare Events: Sparse but high-impact stress factors (e.g., extremely high workload) may not be faithfully reproduced.

  • Overreliance on Linear Correlations: Methods like Gaussian Copula can miss non-linear dependencies inherent in stress-related phenomena.

  • Hyperparameter Sensitivity: GAN-based models (CTGAN, CopulaGAN) can exhibit mode collapse or overfitting if not carefully tuned.

  • Context Gaps: Synthetic data lacks real-world nuances (e.g., regulatory changes or cultural shifts), limiting generalizability.

To address these issues, future work can explore (1) diffusion models or normalizing flows for more accurate tail modeling, (2) hybrid training approaches merging real minor-class samples with synthetic data to reduce imbalance, (3) domain-specific priors encoding known workplace patterns, and (4) semi-supervised or transfer learning strategies to improve performance across diverse occupational environments. Ultimately, while synthetic data approaches offer a promising avenue for robust model evaluation, ongoing refinements are needed to ensure they faithfully reflect complex, real-world workplace dynamics.

Ablation studies

To evaluate the robustness of our workplace safety monitoring framework and understand the contribution of different components, we conducted two comprehensive ablation studies:

Feature selection and elimination techniques.

Table 6 presents the impact of removing different feature selection components. The results revealed critical insights for safety monitoring systems:

  • Zero-variance feature removal had minimal impact, suggesting redundancy in basic demographic indicators

  • RFECV elimination significantly impacted model performance, with accuracy drops of 8–12% for the Ensemble Model and SVC, highlighting its importance in identifying safety-critical features

  • ANOVA’s exclusion led to substantial degradation (10–15%) in LogisticRegression and SVC performance, emphasizing its role in capturing stress-safety relationships

  • Deep learning and language models showed the highest sensitivity to feature selection, with performance drops of up to 20%, indicating their reliance on well-curated safety indicators

A key contribution of our method is the empirical demonstration that integrating multiple feature selection techniques results in more predictive features than relying on any single method alone. This can be explained by two key factors. First, ANOVA (a univariate method) evaluates how strongly each feature individually separates the classes, effectively identifying stress-safety relationships at the per-feature level. Second, RFECV (a multivariate method) iteratively ranks and prunes features based on their collective contribution to classification performance, capturing complex interactions that univariate tests may overlook. By combining ANOVA’s ability to detect individually discriminative features with RFECV’s strength in assessing feature sets holistically, our integrated approach ensures the retention of both high-impact individual features and contextually significant feature interactions.

This explains why omitting either RFECV or ANOVA consistently degrades performance across models, especially in machine learning algorithms (e.g., LogisticRegression, SVC) that rely on well-crafted input spaces for robust decision boundaries. Meanwhile, deep learning methods (1D CNN) and large language models (BioBERT, etc.) appear even more sensitive to missing relevant features due to their capacity to learn higher-level abstractions. Without high-quality initial inputs, these models cannot fully leverage their representational power. Hence, the union of these complementary selection methods captures a broader spectrum of safety-relevant signals in the data, leading to the observed 5–10% performance gains and underscoring the necessity of thorough feature engineering in occupational stress detection.

Feature group contribution analysis.

Table 7 shows the impact of removing different feature groups, providing insights for safety monitoring system design:

Table 7. Ablation results on the feature groups (job performance, sperm quality and sociodemographic features), compared to the original results with all the groups selected.
Without job performance Without sperm quality Without sociodemographics Original
Model Accuracy (%) F1-score (%) Accuracy (%) F1-score (%) Accuracy (%) F1-score (%) Accuracy (%) F1-score (%)
GaussianNB 72.58 ± 2.20 69.74 ± 2.30 70.97 ± 2.40 68.90 ± 2.50 72.58 ± 2.10 70.35 ± 2.20 70.97 ± 2.45 68.90 ± 2.50
DecisionTreeClassifier 74.19 ± 2.10 70.48 ± 2.20 70.97 ± 2.30 67.68 ± 2.40 77.42 ± 2.20 73.44 ± 2.30 74.19 ± 2.30 70.48 ± 2.20
RandomForestClassifier 82.26 ± 1.90 78.81 ± 2.00 83.87 ± 1.80 81.55 ± 1.90 87.10 ± 1.70 85.60 ± 1.80 83.87 ± 1.80 81.03 ± 1.70
AdaBoostClassifier 82.26 ± 2.00 79.43 ± 2.10 80.65 ± 2.00 77.23 ± 2.10 80.65 ± 2.00 77.23 ± 2.10 80.65 ± 2.00 77.23 ± 2.10
LGBMClassifier 82.26 ± 2.10 79.96 ± 2.20 79.03 ± 2.10 76.32 ± 2.20 83.87 ± 1.90 82.00 ± 2.00 77.42 ± 2.10 74.17 ± 2.20
XGBClassifier 75.81 ± 2.20 69.03 ± 2.30 80.65 ± 2.00 78.86 ± 2.10 85.48 ± 1.80 84.58 ± 1.90 80.65 ± 2.00 79.26 ± 2.10
LogisticRegression 88.71 ± 1.50 87.25 ± 1.60 88.71 ± 1.50 87.25 ± 1.60 79.03 ± 2.00 76.86 ± 2.10 88.11 ± 1.40 86.90 ± 1.50
SVC 88.71 ± 1.50 87.25 ± 1.60 88.71 ± 1.40 87.25 ± 1.50 85.48 ± 1.80 83.98 ± 1.90 88.71 ± 1.50 87.25 ± 1.60
KNeighborsClassifier 79.03 ± 2.00 75.69 ± 2.10 80.65 ± 2.00 77.23 ± 2.10 75.81 ± 2.20 72.67 ± 2.30 79.03 ± 2.00 75.69 ± 2.20
EnsembleModel (Soft Voting) 88.71 ± 1.50 87.25 ± 1.60 88.71 ± 1.50 87.25 ± 1.60 82.26 ± 2.00 80.82 ± 2.10 88.71 ± 1.50 87.25 ± 1.60
EnsembleModel (Hard Voting) 88.71 ± 1.50 87.25 ± 1.60 87.10 ± 1.40 86.18 ± 1.50 80.65 ± 2.00 79.61 ± 2.10 90.32 ± 1.40 89.20 ± 1.50
1D CNN 82.26 ± 1.90 81.16 ± 2.00 83.87 ± 1.80 82.00 ± 1.90 80.65 ± 2.00 79.61 ± 2.10 87.10 ± 1.70 86.18 ± 1.80
BERT 83.34 ± 2.20 80.23 ± 2.30 76.12 ± 2.10 77.65 ± 2.20 75.98 ± 2.30 76.34 ± 2.40 82.26 ± 2.00 79.43 ± 2.10
BioBERT 80.67 ± 2.10 73.21 ± 2.20 74.12 ± 2.00 76.03 ± 2.10 73.54 ± 2.20 80.54 ± 2.30 90.32 ± 1.40 88.93 ± 1.50
ClinicalBERT 81.23 ± 2.20 72.87 ± 2.30 80.65 ± 2.10 71.45 ± 2.20 79.76 ± 2.30 70.12 ± 2.40 83.87 ± 1.80 82.39 ± 1.90
DischargeBERT 82.45 ± 2.10 73.54 ± 2.20 81.98 ± 2.00 72.76 ± 2.10 80.34 ± 2.20 71.78 ± 2.30 87.10 ± 1.60 86.18 ± 1.70
COReBERT 79.21 ± 2.30 71.45 ± 2.40 78.67 ± 2.20 70.98 ± 2.30 77.45 ± 2.30 69.12 ± 2.40 82.26 ± 2.10 78.11 ± 2.20
  • Job performance features proved crucial, with their removal causing 5-8% performance decline in most models

  • Health-related features showed minimal impact on stress detection accuracy, supporting their optional inclusion in workplace safety monitoring

  • Sociodemographic features significantly influenced model performance (7-12% impact), suggesting their importance in contextualizing workplace stress patterns

These results emphasize the need for comprehensive feature sets in workplace safety monitoring systems, particularly those capturing job performance and sociodemographic factors.

Explainable AI for safety-critical feature analysis

In this study, we employed two primary Explainable AI (XAI) techniques, SHAP and LIME, to gain both global and local interpretability into the model’s decision-making process. We used the SHAP library and the LIME library in Python to generate the visualizations and explanations. Specifically, SHAP summary plots (e.g., Fig 15(a)) provided an overview of each feature’s contribution across the entire dataset (global interpretation), while LIME bar charts (e.g., Fig 15(b)) and SHAP force plots (Figs 17 and 18) offered local explanations for individual predictions.

Fig 15. Top 20 features considered most important for occupational stress detection by (a) SHAP and (b) LIME.

Fig 15

Fig 17. Predictions for the stressed class of a randomly selected instance using (a) LIME and (b) SHAP. Here, the bars (LIME) and SHAP values indicate strong positive attributions for high-stress features such as workload (JS23) and unclear assignments (JS36), suggesting a higher probability of the stressed class.

Fig 17

Fig 18. Predictions for the not stressed class of a randomly selected instance using (a) LIME and (b) SHAP. In this low-stress example, features like positive colleague relationships (JS25) and good communication (JS33) have negative attributions (LIME) or negative SHAP values, pulling the prediction toward the not stressed class.

Fig 18

Fig 15 presents the top 20 global features ranked by both SHAP and LIME, revealing a strong alignment in their identified importance. Although both approaches focus on feature importance, SHAP allowed us to visualize how each feature value (red = higher feature value, blue = lower feature value) pushes the model’s prediction toward or away from “stressed,” whereas LIME provided interpretable local surrogates that explain how small perturbations in individual features influence the prediction outcome. By comparing these global and local explanations, we gained deeper insight into the interplay among the most critical stress-related variables in the model’s decision-making process.

To derive a more interpretable measure of each feature’s contribution to the final prediction, we employed both SHAP and LIME on the best-performing hard-voting ensemble model. We computed (a) the mean absolute SHAP value for each feature and (b) an aggregated LIME weight by averaging local explanations for individual samples. Each feature’s percentage contribution was then obtained by normalizing its SHAP and LIME values against the sum of the top 20 feature importances. Finally, we computed an average of these two metrics (SHAP and LIME) to obtain the Combined (average) SHAP & LIME percentage. Fig 16 presents the top 20 features with their combined percentage contributions, categorized by their role in either inducing or mitigating occupational stress.

Fig 16. Top 20 features influencing occupational stress, categorized by stress-inducing (red), stress-mitigating (green), and neutral (yellow) factors. Features are sorted by their combined SHAP & LIME percentage contribution. Each bar represents a feature’s relative contribution among the top 20 predictors. Higher percentages indicate greater influence on model predictions.

Fig 16

As visualized in the figure, the predictors of occupational stress can be grouped into distinct categories that reveal important workplace dynamics. Our visualization categorizes features mainly into stress-inducing (red) and stress-mitigating (green) factors, with their relative contributions clearly displayed.

Stress-inducing factors (safety risk indicators).

The visualization in Fig 16 reveals several prominent stress-inducing categories:

  • Excessive Workload and Ambiguity: Three features in this category collectively account for over 27% of the predictive power, with JS23 (“I have too much to do at work”) emerging as the single most influential predictor at 12.88%. This dominance underscores how overwhelming workload and bureaucratic constraints (JS21, JS24) significantly contribute to occupational stress.

  • Poor Communication: Two communication-related factors (JS36, JS35) together contribute nearly 17% to the model predictions, appearing as the second and fourth most important features. This highlights how unclear assignments and organizational awareness gaps create substantial psychological strain in the workplace.

  • Co-Worker & Supervisory Issues: The visualization shows two interpersonal factors (JS26, JS11) that contribute over 7% combined. These relationships represent critical stress vectors when dysfunctional.

  • Financial Concerns: JS2 (“Raises are too few and far between”) appears prominently among stress-inducing factors, indicating how compensation dissatisfaction contributes to overall strain.

  • Personal Factors: While less prominent than organizational variables, fertility-related stress (JP4) still appears among the top contributors, demonstrating how personal health factors can compound workplace stress.

Stress-mitigating factors (safety protective elements).

The visualization also identifies several categories of protective factors that buffer against occupational stress:

  • Positive Work Environment: Our visualization reveals four features in this category (JS7, JS4, JS12, JS25) collectively contributing over 16% to the model predictions. When employees perceive fair advancement opportunities and have positive relationships with supervisors and colleagues, stress is substantially mitigated.

  • Recognition and Rewards: Three features related to recognition and benefits (JS17, JS15, JS14) appear prominently in the visualization, together accounting for approximately 13% of predictive power. This emphasizes how reward systems act as important psychological buffers.

  • Promotional and Job Contentment: Three features related to job satisfaction (JS6, JS30, JS31) are visualized as protective factors, highlighting how intrinsic job enjoyment can offset other workplace stressors.

  • Physiological Factors: Sperm quality appears as a neutral factor with minimal contribution (1.67%) compared to workplace variables, reinforcing that organizational elements predominantly drive occupational stress predictions.

Implications for Interventions: As clearly visualized in Fig 16, workload management and communication clarity represent the most promising targets for stress reduction initiatives, together accounting for over 44% of the combined feature importance. The color-coded visualization provides stakeholders with an intuitive understanding of which factors increase stress (red) versus which provide protective benefits (green). This evidence-based approach enables targeted interventions focused on the most influential organizational factors rather than individual health variables.

Overall, this visualization demonstrates that occupational stress is predominantly determined by organizational factors rather than individual characteristics. While personal factors like fertility issues play some role, the overwhelming influence comes from workload, communication, and workplace relationships. This suggests that organizational-level interventions addressing these specific dimensions will likely yield the greatest returns for employee wellbeing.

Figs 17 and 18 illustrate local explanations for two distinct instances: one classified as stressed (Fig 17) and one as not stressed (Fig 18). The LIME bar charts on the left (labeled “(a)”) show how each feature shifts the predicted probability toward or away from the stressed class, while the SHAP summary plots on the right (labeled “(b)”) depict individual feature attributions (horizontal axis) for the same instance. In Fig 17 (stressed case), red bars/features (LIME) and positive SHAP values collectively push the prediction toward ‘stressed,’ with “Excessive workload (JS23)” and “Unclear work assignments (JS36)” displaying the largest positive contributions. In contrast, Fig 18 (not stressed case) highlights how features such as “Good organizational communication (JS33)” and “Positive colleague relationships (JS25)” counterbalance stress by pushing the model toward the not stressed class. These local explanations demonstrate not only which features dominate the model’s decision at an individual level but also how combinations of factors can compound risk or offer protective effects.

Fig 18 shows a low-stress case where positive workplace factors create a safety-promoting environment. Again, the LIME explanation in panel (a) highlights the local feature contributions, while the SHAP summary in panel (b) confirms the direction and magnitude of these effects. Strong colleague relationships (JS25) and job satisfaction (JS32) collectively reduce stress by 30%, demonstrating how mitigating factors can offset stressors. Notably, the green bars in LIME and the negative SHAP values both indicate protective features that lower stress risk.

This bidirectional analysis of both global (Fig 15) and local (Figs 17 and 18) feature contributions provides actionable insights for workplace safety management. For instance, organizations should prioritize workload optimization (JS23) and clear task assignments (JS36) to reduce accident risks, while also fostering supportive communication (JS33) and recognition systems (JS17) as protective measures against stress-induced safety incidents. By understanding how these factors synergize or counterbalance at the individual level, safety managers can tailor interventions that effectively target stress mitigation and enhance overall workplace safety.

Model deployment for workplace safety management

To facilitate practical implementation of stress-based safety monitoring, we deployed our model on the Hugging Face platform using Gradio (https://huggingface.co/spaces/JnS123456/Occupational_stress_detection). This deployed model provides the following benefits:

  • Real-time stress level assessment for safety monitoring

  • User-friendly interface for regular safety checks

  • Immediate feedback for proactive safety intervention

  • Integration capabilities with existing safety management systems

Fig 19 demonstrates the practical implementation of our model through the Hugging Face platform. The interface allows safety managers to input workplace stress indicators and receive immediate predictions. Fig 19a shows the model accurately identifying high-stress scenarios that require immediate safety intervention, while Fig 19b demonstrates the detection of normal stress levels indicating safe working conditions. This real-time assessment capability enables organizations to implement proactive stress monitoring as part of their comprehensive workplace safety programs, facilitating timely interventions before stress-related safety incidents can occur.

Fig 19. Deployed model interface on Hugging Face platform showing stress detection capabilities for workplace safety monitoring. (a) Model prediction interface showing detection of high occupational stress, triggering safety intervention alerts. (b) Model prediction interface demonstrating detection of manageable stress levels, indicating safe working conditions.

Fig 19

Discussion

Methodological significance

This study presents several significant advancements over existing research in occupational stress detection. First, our ensemble-based model integrating Random Forest, Logistic Regression, and Support Vector Classifier demonstrated superior predictive performance, achieving the highest accuracy (90.32%) and macro-averaged F1-score (89.20%) among evaluated methods. This performance notably exceeds previous state-of-the-art approaches that relied on single-model techniques or less comprehensive ensemble strategies.

Second, the robust hybrid feature selection approach, combining Recursive Feature Elimination with Cross-Validation (RFECV) and Analysis of Variance (ANOVA), significantly improved model performance by ensuring that both individually significant and interaction-sensitive features were included. Ablation studies confirmed the complementary nature of these methods, demonstrating that their integration captures critical occupational stress indicators overlooked by single-method approaches.

Third, leveraging natural language sentence generation from tabular survey data enabled the effective utilization of pre-trained large language models, notably BioBERT, which performed comparably to our best ensemble model. This data transformation approach facilitated deep contextual understanding, bridging the gap between traditional tabular analysis and advanced language models, thus expanding the methodological landscape for occupational stress detection.

Collectively, these contributions establish our method’s superiority in predictive accuracy, interpretability, and generalizability, making it particularly suitable for real-world deployment in workplace safety management systems.

Implications for workplace safety

The study has significant implications for various stakeholders in workplace safety management:

Implications for Safety Management Systems:

  • Real-time Monitoring: The deployed model enables continuous stress monitoring as part of safety management systems, addressing the reactive nature of traditional approaches noted by [46].

  • Risk Assessment: Integration of stress detection into safety protocols allows early identification of high-risk situations, particularly when multiple stress factors compound (e.g., high workload combined with unclear assignments).

  • Prevention Strategies: The quantified impact of different stressors (e.g., 32% contribution from excessive workload) enables prioritized safety interventions.

Implications for safety professionals:

  • Risk Evaluation: Safety officers can use the model to assess how stress levels may compromise safety protocols and behavior.

  • Intervention Design: The bidirectional analysis of stress factors (inducing vs. mitigating) enables targeted safety program development.

  • Performance Monitoring: The tool provides objective metrics for evaluating the effectiveness of safety interventions.

Implications for organizational safety culture:

  • Policy Development: Organizations can develop evidence-based safety policies that address both direct hazards and stress-related risks.

  • Training Programs: The identified stress patterns can inform safety training programs that address both technical and psychosocial aspects.

  • Communication Strategies: The importance of clear work assignments (stress reduction of 28%) suggests the need for improved safety communication protocols.

Implications for regulatory framework:

  • Standard Development: The findings can inform occupational safety standards that incorporate stress management requirements.

  • Inspection Protocols: Regulatory bodies can develop more comprehensive inspection protocols that include stress assessment.

  • Risk Classification: The quantified stress impacts can help in developing risk classification systems for different workplace environments.

Limitations and future work

While this study advances workplace safety through AI-driven stress detection, several limitations warrant attention:

  • Population Specificity: The current model is based on Malaysian workplace data, potentially limiting its generalizability to different safety cultures and regulatory environments. Future research should improve generalizability by taking diverse populations into consideration.

  • Temporal Dynamics: The cross-sectional nature of the data doesn’t capture how stress patterns evolve over time in response to changing safety conditions. Future research may benefit by focusing on longitudinal studies examining the relationship between stress patterns and safety incidents.

  • Intervention Validation: While the study identifies stress factors, the effectiveness of specific safety interventions based on these findings requires validation. Future research could design and test targeted interventions, evaluating their effectiveness using the proposed occupational stress detection and workplace safety models.

These future directions will further strengthen the connection between stress management and workplace safety, ultimately contributing to more effective occupational safety programs.

Conclusions

This study presents a safety-centered, AI-driven framework for occupational stress detection that integrates machine learning, deep learning, and large language models into proactive workplace safety management. By combining a comprehensive preprocessing pipeline, multi-technique feature selection, and robust model development, we achieved a 90.32% accuracy, surpassing the performance of existing state-of-the-art methods. One crucial finding of the research is that combining RFECV and ANOVA techniques for feature selection yields better prediction accuracy than using them individually. Moreover, domain analysis using LLMs revealed that occupational stress is closely related to the biomedical domain than clinical or generalist domains, indicating that occupational stress rises from both physical and psychological factors rather than just one of them.

The employed methods in the study collectively offer a quantifiable and interpretable approach to understanding how organizational elements, particularly excessive workload and ambiguity (27%), poor communication (17%), and positive work environment (16%), impact occupational stress levels. Our three-fold validation techniques: holdout validation, cross-validation, and external validation with synthetic data, establishes the reliability and robustness of this framework, providing substantial evidence for its applicability in diverse settings.

Although our approach addresses critical gaps in current occupational stress research, there remain potential limitations related to population specificity, temporal dynamics, and intervention validation. Future work can extend this framework to different cultural contexts, employ longitudinal studies for capturing stress evolution, and integrate targeted interventions to validate effectiveness. By continually refining these components, organizations and safety practitioners can proactively mitigate the risks associated with occupational stress, ultimately fostering safer and more resilient work environments.

Supporting information

S1 Dataset. Synthetic data using CopulaGAN.

(CSV)

pone.0323265.s001.csv (691.7KB, csv)
S2 Dataset. Synthetic data using CTGAN.

(CSV)

pone.0323265.s002.csv (640.8KB, csv)
S3 Dataset. Synthetic data using GaussianCopula.

(CSV)

pone.0323265.s003.csv (716.8KB, csv)
S4 Dataset. Synthetic data using TVAE.

(CSV)

pone.0323265.s004.csv (696.7KB, csv)
S5 Text. Codebook for synthetic data.

(PDF)

pone.0323265.s005.pdf (33.6KB, pdf)

Data Availability

The data used in this study can be accessed from https://data.mendeley.com/datasets/cgyh5s88kc/3 DOI: 10.17632/cgyh5s88kc.3 Synthetic data generated in this study can be found from the supporting information files. Code can be found at https://github.com/junayed-hasan/occupational-stress-ml.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Sparks K, Faragher B, Cooper CL. Well‐being and occupational health in the 21st century workplace. J Occupat Organ Psyc. 2001;74(4):489–509. doi: 10.1348/096317901167497 [DOI] [Google Scholar]
  • 2.Abrams HK. A short history of occupational health. J Publ Health Policy. 2001;22(1):34–80. doi: 10.2307/3343553 [DOI] [PubMed] [Google Scholar]
  • 3.LaDou J. International occupational health. Int J Hyg Environ Health. 2003;206(4–5):303–13. doi: 10.1078/1438-4639-00226 [DOI] [PubMed] [Google Scholar]
  • 4.Motowidlo SJ, Packard JS, Manning MR. Occupational stress: its causes and consequences for job performance. J Appl Psychol. 1986;71(4):618–29. doi: 10.1037//0021-9010.71.4.618 [DOI] [PubMed] [Google Scholar]
  • 5.Rout UR, Rout JK. Occupational stress. Stress management for primary health care professionals. 2002; p. 25–39.
  • 6.Torres GMS, Backstrom J, Duffy VG. A systematic review of workplace stress and its impact on mental health and safety. In: International Conference on Human-Computer Interaction. Springer; 2023. p. 610–27. [Google Scholar]
  • 7.Davies ACL. Stress at work: individuals or structures?. Indust Law J. 2021;51(2):403–34. doi: 10.1093/indlaw/dwab006 [DOI] [Google Scholar]
  • 8.Härkänen T, Kuulasmaa K, Sares-Jäske L, Jousilahti P, Peltonen M, Borodulin K, et al. Estimating expected life-years and risk factor associations with mortality in Finland: cohort study. BMJ Open. 2020;10(3):e033741. doi: 10.1136/bmjopen-2019-033741 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Magnusson Hanson LL, Westerlund H, Chungkham HS, Vahtera J, Rod NH, Alexanderson K, et al. Job strain and loss of healthy life years between ages 50 and 75 by sex and occupational position: analyses of 64 934 individuals from four prospective cohort studies. Occup Environ Med. 2018;75(7):486–93. doi: 10.1136/oemed-2017-104644 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Skorikov M, Hussain MA, Khan MR, Akbar MK, Momen S, Mohammed N, et al. Prediction of absenteeism at work using data mining techniques. In: 2020 5th International Conference on Information Technology Research (ICITR). 2020. p. 1–6. doi: 10.1109/icitr51448.2020.9310913 [DOI]
  • 11.Santos L, Ferreira A, Silva DR da, Pinheiro M, Rijo D. Assessing occupational stress in residential youth care settings: validation of the stress questionnaire for residential youth care professionals. Resident Treat Child Youth. 2022;40(2):217–37. doi: 10.1080/0886571x.2022.2073940 [DOI] [Google Scholar]
  • 12.Long H, Yan L, Zhong X, Yang L, Liu Y, Pu J, et al. Measuring job stress of dental workers in China during the COVID-19 pandemic: reliability and validity of the hospital consultants’ job stress questionnaire. BMC Psychiatry. 2024;24(1):246. doi: 10.1186/s12888-024-05670-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Pangarkar SC, Paigude S, Banait SS, Ajani SN, Mange P, Bramhe MV. Occupational stress and mental health: a longitudinal study in high-stress professions. South Eastern Eur J Publ Health. 2023;68–80.
  • 14.Saravanan P, Nisar T, Zhang Q, Masud F, Sasangohar F. Occupational stress and burnout among intensive care unit nurses during the pandemic: a prospective longitudinal study of nurses in COVID and non-COVID units. Front Psychiatry. 2023;14:1129268. doi: 10.3389/fpsyt.2023.1129268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Coelho LG, Costa PR de F, Kinra S, Mallinson PAC, Akutsu R de CC de A. Association between occupational stress, work shift and health outcomes in hospital workers of the Recôncavo of Bahia, Brazil: the impact of COVID-19 pandemic. Br J Nutr. 2023;129(1):147–56. doi: 10.1017/S0007114522000873 [DOI] [PubMed] [Google Scholar]
  • 16.Zhang M, Liu B, Ke W, Cai Y, Zhang L, Huang W, et al. Correlation analysis between occupational stress and metabolic syndrome in workers of a petrochemical enterprise: based on two assessment models of occupational stress. BMC Public Health. 2024;24(1):802. doi: 10.1186/s12889-024-18305-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.von Känel R, Princip M, Holzgang SA, Garefa C, Rossi A, Benz DC, et al. Coronary microvascular function in male physicians with burnout and job stress: an observational study. BMC Med. 2023;21(1):477. doi: 10.1186/s12916-023-03192-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wójtowicz D, Kowalska J. Analysis of the sense of occupational stress and burnout syndrome among physiotherapists during the COVID-19 pandemic. Sci Rep. 2023;13(1):5743. doi: 10.1038/s41598-023-32958-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Strid EN, Wåhlin C, Ros A, Kvarnström S. Health care workers’ experiences of workplace incidents that posed a risk of patient and worker injury: a critical incident technique analysis. BMC Health Serv Res. 2021;21(1):511. doi: 10.1186/s12913-021-06517-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Leung M-Y, Liang Q, Olomolaiye P. Impact of job stressors and stress on the safety behavior and accidents of construction workers. J Manage Eng. 2016;32(1):04015019. doi: 10.1061/(asce)me.1943-5479.0000373 [DOI] [Google Scholar]
  • 21.Wu X, Wei Y, Jiang T, Wang Y, Jiang S. A micro-aggregation algorithm based on density partition method for anonymizing biomedical data. CBIO. 2019;14(7):667–75. doi: 10.2174/1574893614666190416152025 [DOI] [Google Scholar]
  • 22.Wu X, Zhang YT, Lai KW, Yang MZ, Yang GL, Wang HH. A novel centralized federated deep fuzzy neural network with multi-objectives neural architecture search for epistatic detection. IEEE Trans Fuzzy Syst. 2024.
  • 23.Hasan MJ, Rafat K, Rahman F, Mohammed N, Rahman S. DeepMarkerNet: leveraging supervision from the Duchenne Marker for spontaneous smile recognition. Pattern Recognit Lett. 2024;186:148–55. doi: 10.1016/j.patrec.2024.09.015 [DOI] [Google Scholar]
  • 24.Shoaib MA, Lai KW, Chuah JH, Hum YC, Ali R, Dhanalakshmi S, et al. Comparative studies of deep learning segmentation models for left ventricle segmentation. Front Public Health. 2022;10:981019. doi: 10.3389/fpubh.2022.981019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Hasan MJ, Rahman F, Mohammed N. OptimCLM: Optimizing clinical language models for predicting patient outcomes via knowledge distillation, pruning and quantization. Int J Med Inform. 2025;195:105764. doi: 10.1016/j.ijmedinf.2024.105764 [DOI] [PubMed] [Google Scholar]
  • 26.Khan AE, Hasan MJ, Anjum H, Mohammed N, Momen S. Predicting life satisfaction using machine learning and explainable AI. Heliyon. 2024;10(10). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bosma H, Peter R, Siegrist J, Marmot M. Two alternative job stress models and the risk of coronary heart disease. Am J Public Health. 1998;88(1):68–74. doi: 10.2105/ajph.88.1.68 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kivimäki M, Leino-Arjas P, Luukkonen R, Riihimäki H, Vahtera J, Kirjonen J. Work stress and risk of cardiovascular mortality: prospective cohort study of industrial employees. BMJ. 2002;325(7369):857. doi: 10.1136/bmj.325.7369.857 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Chandola T, Brunner E, Marmot M. Chronic stress at work and the metabolic syndrome: prospective study. BMJ. 2006;332(7540):521–5. doi: 10.1136/bmj.38693.435301.80 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.CropanzanO R, Howes JC, Grandey AA, Toth P. The relationship of organizational politics and support to work behaviors, attitudes, and stress. J Organiz Behav. 1997;18(2):159–80. doi: [DOI] [Google Scholar]
  • 31.Ramirez AJ, Graham J, Richards MA, Cull A, Gregory WM. Mental health of hospital consultants: the effects of stress and satisfaction at work. Lancet. 1996;347(9003):724–8. doi: 10.1016/s0140-6736(96)90077-x [DOI] [PubMed] [Google Scholar]
  • 32.Ganster DC, Fusilier MR, Mayes BT. Role of social support in the experience of stress at work. J Appl Psychol. 1986;71(1):102–10. doi: 10.1037//0021-9010.71.1.102 [DOI] [PubMed] [Google Scholar]
  • 33.Russell DW, Altmaier E, Van Velzen D. Job-related stress, social support, and burnout among classroom teachers. J Appl Psychol. 1987;72(2):269–74. doi: 10.1037//0021-9010.72.2.269 [DOI] [PubMed] [Google Scholar]
  • 34.Landsbergis PA. Occupational stress among health care workers: a test of the job demands‐control model. J Organ Behavior. 1988;9(3):217–39. doi: 10.1002/job.4030090303 [DOI] [Google Scholar]
  • 35.Akerstedt T, Knutsson A, Westerholm P, Theorell T, Alfredsson L, Kecklund G. Sleep disturbances, work stress and work hours: a cross-sectional study. J Psychosom Res. 2002;53(3):741–8. doi: 10.1016/s0022-3999(02)00333-1 [DOI] [PubMed] [Google Scholar]
  • 36.Judge TA, Colquitt JA. Organizational justice and stress: the mediating role of work-family conflict. J Appl Psychol. 2004;89(3):395–404. doi: 10.1037/0021-9010.89.3.395 [DOI] [PubMed] [Google Scholar]
  • 37.Frideman M, Rosenman RH, Carroll V. Changes in the serum cholesterol and blood clotting time in men subjected to cyclic variation of occupational stress. Circulation. 1958;17(5):852–61. doi: 10.1161/01.cir.17.5.852 [DOI] [PubMed] [Google Scholar]
  • 38.Leymann H, Gustafsson A. Mobbing at work and the development of post-traumatic stress disorders. Eur J Work Organiz Psychol. 1996;5(2):251–75. doi: 10.1080/13594329608414858 [DOI] [Google Scholar]
  • 39.Hansen AM, Hogh A, Persson R, Karlson B, Garde AH, Ørbaek P. Bullying at work, health outcomes, and physiological stress response. J Psychosom Res. 2006;60(1):63–72. doi: 10.1016/j.jpsychores.2005.06.078 [DOI] [PubMed] [Google Scholar]
  • 40.Rosenthal T, Alter A. Occupational stress and hypertension. J Am Soc Hypertens. 2012;6(1):2–22. doi: 10.1016/j.jash.2011.09.002 [DOI] [PubMed] [Google Scholar]
  • 41.Buunk B, de Jonge J, Ybema J, de Wolff C. Psychosocial aspects of occupational stress. In: A handbook of work and organizational psychology. Psychology Press; 2013. p. 145–82. [Google Scholar]
  • 42.Moreno Fortes A, Tian L, Huebner ES. Occupational stress and employees complete mental health: a cross-cultural empirical study. Int J Environ Res Public Health. 2020;17(10):3629. doi: 10.3390/ijerph17103629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Schwarzer R, Hallum S. Perceived teacher self‐efficacy as a predictor of job stress and burnout: mediation analyses. Appl Psychol. 2008;57(s1):152–71. doi: 10.1111/j.1464-0597.2008.00359.x [DOI] [Google Scholar]
  • 44.Bozorgmehr A, Thielmann A, Weltermann B. Chronic stress in practice assistants: An analytic approach comparing four machine learning classifiers with a standard logistic regression model. PLoS One. 2021;16(5):e0250842. doi: 10.1371/journal.pone.0250842 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Morales C, Dolan SC, Anderson DA, Anderson LM, Reilly EE. Exploring the contributions of affective constructs and interoceptive awareness to feeling fat. Eat Weight Disord. 2022;27(8):3533–41. doi: 10.1007/s40519-022-01490-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Mohan L, Panuganti G. Perceived stress prediction among employees using machine learning techniques. In: 2022 International Conference on Communication, Computing and Internet of Things (IC3IoT). IEEE; 2022. p. 1–6. [Google Scholar]
  • 47.Garlapati A, Krishna DR, Garlapati K, Narayanan G. Predicting employees under stress for pre-emptive remediation using machine learning algorithm. In: 2020 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT). 2020. p. 315–9. doi: 10.1109/rteict49044.2020.9315726 [DOI]
  • 48.Gias FB, Alam F, Momen S. Anxiety mining from socioeconomic data. In: Computer Science On-line Conference. Springer; 2023. p. 472–88. [Google Scholar]
  • 49.Siddiqua R, Islam N, Bolaka JF, Khan R, Momen S. AIDA: artificial intelligence based depression assessment applied to Bangladeshi students. Array. 2023;18:100291. doi: 10.1016/j.array.2023.100291 [DOI] [Google Scholar]
  • 50.Kim S-S, Gil M, Min EJ. Machine learning models for predicting depression in Korean young employees. Front Public Health. 2023;11:1201054. doi: 10.3389/fpubh.2023.1201054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Xia Z, Chen C-H, Lim WL. An explorative neural networks-enabled approach to predict stress perception of traffic control operators in dynamic working scenarios. Adv Eng Inform. 2023;56:101972. doi: 10.1016/j.aei.2023.101972 [DOI] [Google Scholar]
  • 52.Majid NFH, Muhamad S, Kusairi S, Ramli R. Survey dataset on occupational stress, job satisfaction, and job performance among male fertility patients. Data Brief. 2024;53:110152. doi: 10.1016/j.dib.2024.110152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Hasan MJ, Mahdy M. Bridging classical and quantum machine learning: Knowledge transfer from classical to quantum neural networks using knowledge distillation. arXiv preprint. 2023. https://arxiv.org/abs/2311.13810
  • 54.Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics; 2019. p. 4171–86. [Google Scholar]
  • 55.Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40. doi: 10.1093/bioinformatics/btz682 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Huang K, Altosaar J, Ranganath R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint. 2019. https://arxiv.org/abs/1904.05342
  • 57.Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D, Naumann T, et al. Publicly available clinical BERT embeddings; 2019.
  • 58.Van Aken B, Papaioannou JM, Mayrdorfer M, Budde K, Gers FA, Loeser A. Clinical outcome prediction from admission notes using self-supervised knowledge integration. arXiv preprint. 2021. https://arxiv.org/abs/2102.04110
  • 59.Hasan MJ, Noor S, Khan MA. Preserving the knowledge of long clinical texts using aggregated ensembles of large language models. arXiv preprint. 2023. https://arxiv.org/abs/2311.01571
  • 60.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30. [Google Scholar]
  • 61.Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature. 2020;585(7825):357–62. doi: 10.1038/s41586-020-2649-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.McKinney W, et al. Data structures for statistical computing in Python. SciPy. 2010;445:51–6. [Google Scholar]
  • 63.Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9(3):90–5. doi: 10.1109/mcse.2007.55 [DOI] [Google Scholar]
  • 64.Waskom M. seaborn: statistical data visualization. JOSS. 2021;6(60):3021. doi: 10.21105/joss.03021 [DOI] [Google Scholar]
  • 65.Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst. 2019;32. [Google Scholar]
  • 66.Kingma DP. Adam: a method for stochastic optimization. arXiv preprint. 2014. https://arxiv.org/abs/1412.6980
  • 67.Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; 2020. p. 38–45.
  • 68.Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv preprint. 2017. https://arxiv.org/abs/1711.05101
  • 69.Patki N, Wedge R, Veeramachaneni K. The synthetic data vault. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE; 2016. p. 399–410. [Google Scholar]
  • 70.Li Z, Zhao Y, Fu J. SynC: a copula based framework for generating synthetic data from aggregated sources. In: 2020 International Conference on Data Mining Workshops (ICDMW). 2020. p. 571–8. doi: 10.1109/icdmw51313.2020.00082 [DOI]
  • 71.Xu L, Skoularidou M, Cuesta-Infante A, Veeramachaneni K. Modeling tabular data using conditional gan. Adv Neural Inf Process Syst. 2019; 32. [Google Scholar]
  • 72.Kamthe S, Assefa S, Deisenroth M. Copula flows for synthetic data generation. arXiv preprint. 2021. https://arxiv.org/abs/2101.00598
  • 73.Vakharia V, Gujar R. Prediction of compressive strength and Portland cement composition using cross-validation and feature ranking techniques. Construct Build Mater. 2019;225:292–301. doi: 10.1016/j.conbuildmat.2019.07.224 [DOI] [Google Scholar]
  • 74.Oyedele O. Determining the optimal number of folds to use in a K-fold cross-validation: a neural network classification experiment. Res Math. 2023;10(1):2201015. [Google Scholar]

Decision Letter 0

Matthew Chin Heng Chua

PONE-D-24-54716Early detection of occupational stress: Enhancing workplace safety with machine learning and large language modelsPLOS ONE

Dear Dr. Momen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 28 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Matthew Chin Heng Chua

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The issues are listed in the following:

1. The professional English editing is recommended. The authors should get editing help from someone with full professional proficiency in English.

2. The abstract contains a large amount of information. It is recommended to simplify the language and highlight the innovative points and main results of the study. Some important methods suggested references is higher quality or a new literature, such as CNN "DOI: 10.1109 / TFUZZ. 2024.3369944", "DOI10.3389 / fpubh. 2022.981019"; Machine learning methods can refer to "DOI10.2174/1574893614666190416152025".

3. Although SHAP and LIME are used for model interpretation, the results primarily focus on feature importance, lacking in-depth analysis of the model's decision-making process.

4.  It is recommended to include an analysis of the differences between synthetic and real data, discuss the limitations of synthetic data generation methods, and propose improvements.

5. High-resolution images should be provided to ensure that all details are visible and can be thoroughly examined by readers.

6. It is recommended to expand the literature review section, particularly with a deeper discussion of recent advancements in AI technologies for occupational stress detection.

7. It is recommended to add a detailed description of the synthetic data generation methods in the methodology section, including the algorithms used, parameter settings, and quality assessment of the generated data.

8. What is the main difference or importance of the proposed methods and the other state-of-the-arts?

9. The conclusion should be concise and powerful, summarizing the main findings and contributions of the research.

10. The Conclusion section should point out the potential disadvantages and possible future research directions of the manuscript. How this work can be extended in future?

Reviewer #2: My comments are as follows:

1. How does the proposed AI-based framework combine machine learning, deep learning, and large language models to create a unified approach for occupational stress detection?

2. The manuscript mentions the integration of RFECV and ANOVA for feature selection. How were these techniques combined, and what criteria were used to determine the final set of 39 critical stressors?

3. Could you elaborate on the advanced preprocessing techniques used, particularly in managing imbalanced data or missing values?

4. The manuscript reports a 90.32% accuracy for the ensemble model and 89.00% on synthetic data. Were other performance metrics, such as precision, recall, F1-score, and AUC-ROC, evaluated to assess the model’s reliability in stress detection?

5. 3.It is always recommended to apply 10-fold CV for unbiased and reliable prediction results, which is not possible due to the random split of data during training and testing. It is kindly requested to consider these recommendations and add results OR discussion in the revised version with reference to the suggested references provided.

a. https://doi.org/10.1016/j.conbuildmat.2019.07.224

b. https://doi.org/10.1080/27684830.2023.2201015

6. The use of Explainable AI techniques is highlighted in the study. How were these techniques implemented, and what specific visualizations or interpretability tools were used to quantify the impact of workplace safety factors?

7. The study identifies excessive workload, unclear work assignments, and poor organizational communication as primary stressors. How were these percentages (e.g., 32%, 28%) quantified, and what statistical or machine learning methods supported these findings?

Reviewer #3: Authors investigated an AI-based framework for proactive occupational stress detection to improve workplace safety. However, before acceptance, major corrections are required:

1.In abstract, please clearly specify the Purpose, Contribution, and findings.

2.Please write down the organization of the paper written end of Introduction.

3.Please provide a mathematical model of proposed 1D CNN.

4.Please analysis computational complexity in your proposed model.

5.It is recommended that professional proofreaders and native English corrections be used.

Reviewer #4: The authors have put sound efforts into addressing the research gap of the early detection of occupational stress in women. This paper explores the application of 11 machine learning techniques, 1D-CNN and five LLM in detecting stress among women.

The work is acceptable subject to the following minor modifications:

1.It claimed that a framework has been developed with ability to process diverse data types and provide explainable results, however, no such framework is presented in the manuscript.

2.In Figure 1, the alphabetical numbering of the captions needs to be corrected.

3.Why in Figure 2, Box plots show a same range for the data distribution for six job performance survey questions?

4.One of the contributions is empirical demonstration by extracting more predictive features than individual methods, detail explanation is missing.

5.More explanation of figure 12 and 13 is required to be added.

6.Figure 14 visibility should be increased.

Reviewer #5: PAper is well written and significantly addressed the research problem successfully by employment machine learning models and statistically analysed results. All results well tabulated with grpahs. Also model has been generalised too

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Saifur Rahman Sabuj

Reviewer #4: No

Reviewer #5: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jun 2;20(6):e0323265. doi: 10.1371/journal.pone.0323265.r003

Author response to Decision Letter 1


25 Mar 2025

Responses to the reviewer and editor comments have been provided in a file titled "Response to Reviewers". Please find the responses in the relevant file. Thank you.

Attachment

Submitted filename: Response to Reviewers.pdf

pone.0323265.s006.pdf (82.7KB, pdf)

Decision Letter 1

Ashad Kabir

Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models

PONE-D-24-54716R1

Dear Dr. Momen,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ashad Kabir, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The authors have adequately addressed the reviewers' comments, and the reviewers have recommended the manuscript for acceptance.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: All comments have been addressed

Reviewer #4: All comments have been addressed

Reviewer #5: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: No

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The author has answered all my questions, and in; Some modifications have been made in the paper and it is recommended that the paper be published.

Reviewer #3: Thanks for correction. Please update the related work section. Some important paper in 2024 and 2025 are missing.

1) An end-to-end lightweight multi-scale CNN for the classification of lung and colon cancer with XAI integration

Reviewer #4: (No Response)

Reviewer #5: All Reviewer comments addressed and paper be accepted. The paper be accepted without any further revision.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

Reviewer #4: Yes: Prof. Sania Bhatti

Reviewer #5: No

**********

Acceptance letter

Ashad Kabir

PONE-D-24-54716R1

PLOS ONE

Dear Dr. Momen,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Ashad Kabir

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Dataset. Synthetic data using CopulaGAN.

    (CSV)

    pone.0323265.s001.csv (691.7KB, csv)
    S2 Dataset. Synthetic data using CTGAN.

    (CSV)

    pone.0323265.s002.csv (640.8KB, csv)
    S3 Dataset. Synthetic data using GaussianCopula.

    (CSV)

    pone.0323265.s003.csv (716.8KB, csv)
    S4 Dataset. Synthetic data using TVAE.

    (CSV)

    pone.0323265.s004.csv (696.7KB, csv)
    S5 Text. Codebook for synthetic data.

    (PDF)

    pone.0323265.s005.pdf (33.6KB, pdf)
    Attachment

    Submitted filename: Response to Reviewers.pdf

    pone.0323265.s006.pdf (82.7KB, pdf)

    Data Availability Statement

    The data used in this study can be accessed from https://data.mendeley.com/datasets/cgyh5s88kc/3 DOI: 10.17632/cgyh5s88kc.3 Synthetic data generated in this study can be found from the supporting information files. Code can be found at https://github.com/junayed-hasan/occupational-stress-ml.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES