Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 May 4;16(5):e0250842. doi: 10.1371/journal.pone.0250842

Chronic stress in practice assistants: An analytic approach comparing four machine learning classifiers with a standard logistic regression model

Arezoo Bozorgmehr 1,*, Anika Thielmann 1,2, Birgitta Weltermann 1,2
Editor: Alfredo Vellido3
PMCID: PMC8096078  PMID: 33945572

Abstract

Background

Occupational stress is associated with adverse outcomes for medical professionals and patients. In our cross-sectional study with 136 general practices, 26.4% of 550 practice assistants showed high chronic stress. As machine learning strategies offer the opportunity to improve understanding of chronic stress by exploiting complex interactions between variables, we used data from our previous study to derive the best analytic model for chronic stress: four common machine learning (ML) approaches are compared to a classical statistical procedure.

Methods

We applied four machine learning classifiers (random forest, support vector machine, K-nearest neighbors’, and artificial neural network) and logistic regression as standard approach to analyze factors contributing to chronic stress in practice assistants. Chronic stress had been measured by the standardized, self-administered TICS-SSCS questionnaire. The performance of these models was compared in terms of predictive accuracy based on the ‘operating area under the curve’ (AUC), sensitivity, and positive predictive value.

Findings

Compared to the standard logistic regression model (AUC 0.636, 95% CI 0.490–0.674), all machine learning models improved prediction: random forest +20.8% (AUC 0.844, 95% CI 0.684–0.843), artificial neural network +12.4% (AUC 0.760, 95% CI 0.605–0.777), support vector machine +15.1% (AUC 0.787, 95% CI 0.634–0.802), and K-nearest neighbours +7.1% (AUC 0.707, 95% CI 0.556–0.735). As best prediction model, random forest showed a sensitivity of 99% and a positive predictive value of 79%. Using the variable frequencies at the decision nodes of the random forest model, the following five work characteristics influence chronic stress: too much work, high demand to concentrate, time pressure, complicated tasks, and insufficient support by practice leaders.

Conclusions

Regarding chronic stress prediction, machine learning classifiers, especially random forest, provided more accurate prediction compared to classical logistic regression. Interventions to reduce chronic stress in practice personnel should primarily address the identified workplace characteristics.

1. Introduction

Occupational stress is an important issue in health care and other workers worldwide [1]. Following stress models introduced by Selye, Lazarus and others, it was shown that chronic stress can lead to adverse (mental) health effects such as burnout or depression [2, 3]. Also, stress can produce temporary or even permanent alterations in memory [4], cognition [5], arousal/sleep [6, 7], and coping behaviours [8]. In our prior study with 214 general practitioners (GPs) and 550 practice assistants from 136 German general practices, we showed that 19.9% of the male GPs (n = 141), 35.6% of the female GPs (n = 73) and 26.4% of the practice assistants (PrAs) had high chronic stress [9]. Overall, the mean prevalence of high chronic stress was 26.3% in this workforce, which is more than twice as prevalent compared to the general population (11%) studied in the representative German Health Interview and Examination Survey for Adults (DEGS1) with more than 7.900 participants [10, 11]. Analyzing for various work and (regional) practice characteristics, we showed that only the weekly working hours correlated with high chronic stress in GPs and PrAs.

However, aiming to develop effective prevention strategies, a more profound understanding of factors causing and/or contributing to high psychological strain on an individual and group level is needed. As workplaces typically are complex and multifactorial social organizations, appropriate statistical methods are needed to analyse for complex associations and cause-effect relationships. Prior studies addressing impaired psychological well-being in primary care workers used standard statistical procedures such as prevalence ratios and logistic regression models to evaluate for associations [9, 12, 13]. These statistical approaches usually simplify the complex relationships between independent variables (features) and response variable (dependent variable): they assume that each independent variable is linked to the outcome by a linear statistical function. This is especially problematic when datasets with large numbers of non-linear interactions and interaction effects between independent variables occur, which make the model more complex [14]. Nowadays, machine learning (ML) approaches offer new opportunities to evaluate complex relationships. Conceptually, ML has the benefit that it efficiently exploits complex and non-linear interactions between variables by minimizing the error between predicted and observed response variables and improve the accuracy of the models compared to standard approaches [15, 16]. By using a large dataset available on practice assistants from our prior study, we aim to develop better understanding workplace factors, associated with chronic stress in practice assistants using machine learning. Thus, we compare four machine learning classifiers (random forest, support vector machine, K-nearest neighbors’, artificial neural network) with a standard logistic regression model using standard measurements to compare test accuracy, i.e. to derive the best prediction model for chronic stress in practice assistants in primary care.

Regarding terminology, we like to point out that we use the term “prediction” as used in the context of machine learning: it refers to the output of an algorithm after it has been trained on a dataset and applied to new data to forecast the likelihood of a particular outcome. In contrast, in epidemiological analyses, a (risk) prediction model refers to a mathematical equation that uses patient characteristics (risk factors) to estimate the probability of a defined outcome prospectively.

2. Methods

2.1 Data source

The dataset used for the analyses was derived from our cross-sectional study addressing stress among general practice personnel (GPs, PrAs), which was performed among general practices belonging to the teaching practice network of the Institute for General Medicine, University Hospital Essen, Essen, Germany. A total of 764 professionals from 136 practices had taken part in the survey, which was performed in 2014. The design of the study and key results addressing the 214 GPs (practice owners and employed physicians) and 550 practice assistants (PrAs) (including medical secretaries and practice assistants in trainees) are published [9]. This analysis addresses chronic stress in 550 practice assistants (PrAs), which are the largest professional group in general practices. We documented that 26.4% of the 550 practice assistants (PrAs) had high chronic stress, as well as 19.9% of the male (n = 141) and 35.6% of the female (n = 73) general practitioners (GPs) [9]. In this workforce, the average of workers with high chronic stress was 26.3% (n = 201).

2.2 Ethics statement

Ethical approval for the survey had been obtained from the Ethics Committee of the Medical Faculty of the University of Duisburg-Essen (reference number: 13-5536-BO, date of approval: 24/11/2014). All participants had received written information and signed informed consent forms. The principal investigator of the study (B.W) and coauthor of this manuscript provided the data for this analysis.

2.3 Outcome

The primary outcome is strain due to chronic stress over the past three months. Chronic stress was measured using the German short version of the standardized, validated, self-administered TICS-SSCS questionnaire [17, 18]. This instrument measures strain due to chronic stress for the past three months. It consists of 12 items on 5-point Likert scales (0 = ‘never’ und 4 = ‘very often’). The TICS-SSCS values are added to a sum-score. The score ranges from 0 to 48 with 0 denoting ‘never stressed’ and 48 ‘very often stressed’, and reflects subjective strain due to chronic stress [17, 18]. Following the definition of chronic stress of our prior analysis, the TICS scores were dichotomized using the median (TICS = 23) as cut-off (0 = no chronic stress (TICS < 23), 1 = strain due to chronic stress (TICS ≥ 23)).

2.4 Socio-demographic and workplace characteristics

A total of 64 sociodemographic and workplace characteristics were used for the analyses. The sociodemographic characteristics included e.g., age, marital status, number of persons in household. Work-related characteristics comprised details on the employment (e.g., number of hours per week, work status, employment contract), duties in practice (e.g., reception, telephone, prescription, blood pressure measurement) and subjective perceptions of workload (e.g., self-determination of sequence of work steps, influence on work assigned, plan the work independently). The standardized `short questionnaire for workplace analysis’ (German: Kurzfragebogen zur Arbeitsanalyse (KFZA)) was used to assess workplace characteristic [19]. For details on the work characteristics see Tables 13. In line with the TICS instrument, which addresses strain due to chronic stress during the past three months, all workplace characteristics had been requested regarding the past three months (see Table 4).

Table 1. Sociodemographic characteristics of practice assistants (n = 550) and strain due to chronic stress (measured by the standardized and validated TICS tool): Items and sum scores.

Participants (N = 550)
Continuous variables Mean SD Range
Age 38 12.61 16–71
Persons in household more age 18 2 1.12 0–6
Persons in household below age 18 1 0.84 0–6
Number of physicians in practice 3 2.16 1–10
Number of practice assistants in practice 8 7.66 0–35
Categorical variables n %
Female gender 544 99.3
Marital status
Married 277 50.4
Single 221 40.2
Divorced 45 8.2
Widowed 7 1.3
Number of persons in household 72 13.1
Cares for next of kin 75 13.6
Working hours/week
1–9 hours 12 2.2
10–19 hours 52 9.5
20–29 hours 116 21.1
30–39 hours 221 40.2
40–49 hours 116 21.1
50–59 hours 12 2.2
>60 hours 10 1.8
Working full time 364 66.2
Has open-ended employment contract 466 84.7
Had participated in stress seminar in the past 31 5.6
Had used counseling for stress reduction 50 9.1
High strain due to chronic stress (TICS ≥ 23) 125 22.7

Table 3. Self-assessment of workplace situation (n = 550 practice assistants).

Work aspects Workplace factor Mean Score (PrAs) 95% CI
Job content Versatility 3.6 3.58–3.7
Completeness of task 3.5 3.41–3.57
Resources Scope of action 3.4 3.37–3.49
Social support 4.0 3.98–4.12
Cooperation 3.6 3.53–3.66
Stressors Qualitative work demands 2.2 2.14–2.29
Quantitative work demands 2.9 2.83–3.01
Work disruptions 2.7 2.67–2.81
Workplace environment 2.2 2.13–2.3
Organizational culture Information and participation 3.6 3.57–3.73
Benefits 2.9* 2.77–2.94

Table 4. Chronic stress of practice assistants: Results of TICS (Trierer Inventory of Chronic Stress) (n = 550).

How often in the last 3 months did you experience … Never Rarely Sometimes Frequently Very Frequently
n(%) n(%) n(%) n(%) n(%)
Fear, something unpleasant might occur 72 (13.1) 213 (38.7) 190 (34.5) 54 (9.8) 21 (3.8)
Lack of recognition for good performance 158 (28.7) 157 (28.5) 121 (22.0) 71 (12.9) 42 (7.6)
Times with too many obligations 38 (6.9) 119 (21.6) 167 (30.4) 157 (28.5) 67 (12.2)
Times when being unable to suppress worrying thoughts 90 (16.4) 174 (31.6) 182 (33.1) 83 (15.1) 21 (3.8)
Work is not appreciated despite doing the best 157 (28.5) 200 (36.4) 116 (21.1) 56 (10.2) 20 (3.6)
Everything is too much 86 (15.7) 174 (31.7) 174 (31.7) 85 (15.5) 30 (5.5)
Times of worry and one cannot stop it 138 (25.1) 186 (33.9) 139 (25.3) 57 (10.4) 29 (5.3)
Times when being unable to perform as expected 120 (21.8) 299 (54.4) 107 (19.5) 19 (3.5) 5 (0.9)
Times in which the responsibility for others is a burden 162 (29.5) 215 (39.1) 123 (22.4) 42 (7.6) 8 (1.5)
Times when the work gets too much 85 (15.5) 205 (37.3) 183 (33.3) 60 (10.9) 17 (3.1)
Fear of not being able to perform the tasks 126 (22.9) 229 (41.6) 137 (24.9) 43 (7.8) 15 (2.7)
Times when being overwhelmed with worries 165 (30.0) 189 (34.4) 128 (23.3) 45 (8.2) 23 (4.2)

Table 2. Practice and workplace characteristics during the past three months (n = 550 practice assistants).

Practice characteristics
Practice structure
Working in group practice 296 53.8
Working in single physician practice 147 26.7
Working in practice with several locations 50 9.1
Working in practice with an employed physician 39 7.1
Working in privately owned health center 6 1.1
Medical records
Electronic medical records (EHR) 348 63.3
Paper and electronic records 187 34.0
Practice services
Emergent home visits 515 93.6
Practice offers regular home visits 511 92.9
Nursing home visits* 508 92.4
Tasks of practice assistant during past 3 months
Scheduled appointments 518 94.2
Documented in patients´ EHR 513 93.3
Prepared prescriptions 504 91.6
Pulled up paperhealth records or opened electronic patient files 500 90.9
Performed phone service 499 90.7
Worked at reception 486 88.4
Obtained blood pressure readings 461 83.8
Performed ECGs 430 78.2
Prepared practice equipment for the day and switch them off in the evening 414 75.3
Performed laboratory work 393 71.5
Supported physician during patient-consultations 363 66.0
Supported billing of statutory health insurance patients
358 65.1
Performed disease-management examinations 332 60.4
Applied long-term blood pressure devices* 327 59.5
Ordered medical supply 284 51.6
Applied long-term ECG* 247 44.9
Ordered office supply 239 43.5
Performed treadmill testing 237 43.1
Supported billing of private patients* 236 42.9
Performed doppler examination of foot vessels/measured ankle-arm index* 103 18.7

*Missing values above 5%

2.5 Statistical analysis

2.5.1 Handling of missing data

Missing values were observed in 0.2% to 11%. If missing data were above 5%, this is indicated in the Tables 13. Common imputation methods for supervised learning were applied to handle missing data [20]. The K-nearest neighbors algorithm was used for imputing missing values in TICS scores with k = 10. For continuous variables we used median imputation and for categorical variables a separate category ‘unknown’ [20].

2.5.2 Preparation of datasets for machine learning

After pre-processing the data to compare machine learning classifiers, the dataset was split into a ‘training’ and a ‘validation’ dataset. Fig 1 illustrates the study process flow. We used the 10-fold cross validation approach in machine learning models to measure the unbiased prediction accuracy of the models (see Fig 2). Based on the literature, 10 was chosen as optimal number of folds, which optimizes the time to complete the test while minimizing the bias and variance associated with the validation process [2123]. The K-Fold cross validation method also called rotation estimation is used to minimize the bias associated with the random sampling of the training and holdout data samples in comparing the predictive accuracy of two or more machine learning methods. In this method the complete dataset (D) is randomly split into k mutually exclusive subsets (the folds: D1, D2,…, Dk) of approximately equal size. The classification model is trained and tested k times. Each time (t 2 {1, 2,…, k}), it is trained on all but one folds (Dt) and tested on the remaining single fold (Dt). The cross validation estimate of the overall accuracy is calculated as the average of the k individual accuracy measures by formula:

CVA=i=1kAi (1)
Fig 1. Machine learning data extraction process flow.

Fig 1

Fig 2. K-Fold cross validation.

Fig 2

Where CVA stands for cross-validation accuracy, k is the number of folds used, and A is the accuracy measure of each fold [21].

2.5.3 Logistic regression as standard statistical procedure

Logistic Regression (LR) is a classical statistical modelling procedure to analyze one dependent dichotomous or binary outcome and one or more nominal, ordinal, interval or ratio-level independent variables. LR models are frequently applied to exposure-event studies in medical research, because they can be used to estimate the model predictors’ odds ratio [24]. All variables significant in bivariate analysis were included in the logistic regression model.

2.5.4 Machine learning approaches

1) K-Nearest Neighbors (KNN) classifies an object by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive integer). If k = 1, the object is simply assigned to the class of its nearest neighbor. KNN is a type of instance-based or lazy learning where the function is only approximated locally and all computation is deferred until classification [25, 26]. In this study, we used KNN applying k = 10 neighbors, which are the ten closest observations in multidimensional space based on Euclidean distance function to model the training dataset.

2) Support Vector Machine (SVM) represents different outcome classes in a hyperplane in multidimensional space to find the maximum marginal hyperplane. SVM generates the hyperplane in an iterative manner to minimize the error. A basic SVM is a non-parametric linear classifier that creates a hyperplane using the Euclidean distance function from the nearest input values to determine the target states. In order to obtain probability estimates, a logistic regression model is fitted to the output of the support vector machine [25]. In this study, the SVM classifier used RBF (Radial basis function) kernel, a training error of 1.0E-12, and a default boundary tolerance of a 1.0E-03 hyperplane. To obtain proper probability estimates, we used the option that fits calibration models to the outputs of the SVM.

3) Random Forest (RF) is a collection of decision trees, each constructed in a bootstrapped sample and from a random subset of the possible predictors at each node. RF is used to reduce variance associated with decision trees [27, 28]. In this study, the forest is constructed consisting of randomly 1,000 individual trees. A large number of trees increases the predictive accuracy of RF models and the forest does not require extensive tuning [29]. Due to the insensitivity of error rates to the number of features selected to split each node, we used the default of a random sample of √n of predictors at each node with n being the total number of predictors under consideration. The predicted probability was derived based on average prediction across all of the trees.

4) Artificial Neural Network (ANN) is a computational and flexible model that expresses complex non-linear relationships among features, which consist of an interconnected group of variables. A basic ANN model consists of three layers of neurons, i.e. input, output, and hidden layer. These layers can learn from data iteratively through a backpropagation classifier. It trains a multilayer perceptron with one hidden layer, an input layer with the number of nodes equal to the sum of features, and an output layer [30]. This study used a multilayer Perceptron classifier with one hidden layer, a learning rate value with decay of 0.3, and a momentum rate for the backpropagation classifier of 0.2. Suitable ranges for these parameters are within 0.15–0.8 for learning rate and 0.1–0.4 for momentum [30].

Development of the models was completed using Python (Version 3.7.3) and Python’s Scikit-Learn library (https://scikit-learn.org/stable/).

3. Results

3.1 Sociodemographic and workplace characteristics of the study population

The dataset comprised results of 550 PrA from 136 general practices. The vast majority of the total of PrAs were females (98.9%) with a mean age of 38 years (SD 12.6). Regarding the marital status, 50.6% (n = 277) of the PrAs were married. On average, they worked in the current practice for 18.8 years (SD 12.5), 32.5% in part-time.

3.2. Primary outcome: Strain due to chronic stress

The TICS score of the population ranged from 0 to 44 with a mean of 17.2 and median of 17.0. In the total dataset, 22.7% (n = 125) had high strain due to chronic stress versus 77.3% (n = 425) low strain due to chronic stress. Regarding socio-demographic characteristics personnel with high strain due to chronic stress showed the following significant differences compared to those with low strain: older PrAs (mean 38.76) vs. younger PrAs (mean 24.36), unmarried PrAs (29.4%) vs. married PrAs (17%). While caring for next of kin did not differ between groups. No gender-specific distribution was applied, because PrAs were predominantly female (98.9%). All regression and machine learning approaches were applied to the dataset with female subjects only (n = 546).

3.3. Results of four machine learning classifiers

3.3.1 Prediction accuracy

The performance of the machine learning classifiers was assessed using the validation dataset by calculating Harrell’s c-statistic, a measure of the total area under the receiver operating characteristic curve (AUC) [31]. The results showed an AUC of 0.844 (95%CI, 0.684–0.843) for RF, 0.760 (95%CI, 0.605–0.777) for ANN, 0.787 (95%CI, 0.634–0.802) for SVM, and 0.707 (95%CI, 0.556–0.735) for KNN.

3.3.2 Classification analysis

Corresponding results of sensitivity and positive prediction value (PPV) for machine learning were 99% and 79% for RF, 87% and 85% for ANN, 87% and 86% for the SVM, and 99% and 78% for KNN.

3.4. Results of Logistic regression analysis

In bivariate analysis, the following factors were associated with strain significantly: persons in household below age 18, marital status, age, working hours/week, room equipment, work status, performed laboratory work, obtained blood pressure readings, and performed doppler examination of foot vessels/measured ankle-arm index as duties in practice. C statistics for logistic regression showed an AUC of 0.636 (95%CI, 0.490–0.674). This model predicted 316 cases correctly from 425 total cases, with a sensitivity of 75% and positive prediction value (PPV) of 44%.

3.5. Comparison of ML and regression analysis

The prediction accuracy according to the discrimination (AUC c-statistic) value is shown in Table 5 for all models. All machine learning models achieved statistically improvements in compared to the standard logistic regression model: +20.8% for RF, +15.1% for SVM, +12.4% for ANN, and +7.1% for KNN. Random forest is performing well out of all four machine learning classifiers. RF classifier resulted in a net increase of 104 strain due to chronic stress cases from the logistic regression baseline model, increasing the sensitivity to 99% and PPV to 79%. See Table 6 for more details of machine learning models.

Table 5. Performance of the machine learning algorithms predicting chronic stress derived from applying training algorithms on the validation dataset.

Higher c-statistics results in better algorithm discrimination. The baseline (BL) standard logistic regression model is provided for comparative purposes.

Algorithms AUC c-statistic 95% Confidence Intervall Absolute change in AUC (%)
LCL UCL
BL: Logistic Regression 0.636 0.490 0.674 [Reference]
ML: K-nearest Neighbours 0.707 0.556 0.735 +7.1%
ML: Support Vector Machine 0.787 0.634 0.802 +15.1%
ML: Artificial Neural Network 0.760 0.605 0.777 +12.4%
ML: Random Forest 0.844 0.684 0.843 +20.8%

Table 6. Full details on classification analysis.

Algorithms Chronic stress cases correct (True Positive) Chronic stress cases incorrect (False Negative) Total chronic stress cases Non-chronic stress cases correct (True Negative) Non-chronic stress cases incorrect (False Positive) Total non-chronic stress cases Sensitivity (True Positive) Positive Predictive Value (PPV)
Logistic Regression 316 109 425 68 57 125 0.751 0.440
ML: Random Forest 420 5 425 15 110 125 0.988 0.792
ML: K-nearest Neighbours 421 4 425 6 119 125 0.991 0.780
ML: Support Vector Machine 369 56 425 66 59 125 0.868 0.862
ML: Artificial Neural Network 369 56 425 59 66 125 0.868 0.848

3.6. Variable rankings in machine learning models

Of the 4 ML approaches used, variable importance can only be determined in artificial neural network and random forest. Artificial neural network model uses the overall weighting of the variables within the model. Random forest ranks variable importance based on decision-trees on the selection frequency of the variable as a decision node. For KNN does not provide a method for the importance or coefficients of variables. We used a nonlinear SVM classifier with RBF kernel, which has no variable importance methods. The variable importance was determined by the coefficient effect size for logistic regression model. The identified factors such as persons in household below age 18, age below 35 years old, and insufficient room equipment that have identified by logistic regression, has also identified by ANN and RF. The most determined factors by both of ANN and RF included work related characteristics such as too much work, high demand to concentrate, time pressure, complicated tasks, and insufficient practice room conditions (See Table 7).

Table 7. The most influential predictor variables associated with chronic stress listed by coefficient effect size (Standard logistic regression) weighting (Artificial neural network) and selection frequency (Random forest).

Standard model Machine learning models
Logistic regression Coefficient Artificial Neutral Network Weight (%) Random Forest Frequency
Obtained blood pressure readings 0.951 Too much work 39.7 Too much work 0.73
Persons in household below age 18 0.349 High demand to concentrate 39.3 High demands to concentrate 0.71
Working hours/week more than 40 0.121 Time pressure 36.7 Time pressure 0.70
Work status -0.109 Complicated tasks 31.5 Complicated tasks 0.67
Performed laboratory work 0.091 Insufficient practice room conditions 18.1 Age ≤ 35 0.63
Employment contract 0.063 Interrupted during work 14.9 Insufficient support by practice leaders 0.52
Age ≤ 35 0.045 Persons in household below age 18 13.8 Insufficient workplace environment 0.51
Insufficient workplace environment 0.028 Working hours/week more than 40 hours 12.7 Insufficient practice room conditions 0.50
Performed doppler examination of foot vessels/measured ankle-arm index 0.018 Workplace environment 12.3 Holding together well 0.48
Marital status/single 0.006 Number of practitioners in the practice 10.6 Influence on work assigned 0.43

4. Discussion

To the best of our knowledge, this study is the first to use machine learning for a better understanding of stress in primary care practice personnel. Comparing four common machine learning (ML) approaches to a classical statistical procedure, we showed that all four machine learning approaches provided more accurate models for the prediction of strain due to chronic stress than as standard regression analysis. Random forest showed the highest accuracy with workload, high demand to concentrate, and time pressure being the most important factors associated with chronic stress. These factors were also identified in other studies in the target populations GPs and GP practice personnel. Addressing job satisfaction, Harris et al. identified time pressure as the most frequent stressor in a study with 626 Australian practice staff in 96 general practices [12]. Studying 158 Canadian family physicians, Lee et al. determined the following occupational stressors as relevant: challenging patients, high workload, time limitations, competency issues, challenges of documentation and practice management and changing roles within the workplace [13, 32]. Similarly, Hoffmann et al. showed that the work disruption was a negative relevant workplace factor in study with 550 practice assistants [33]. These stressors are described to influence poor physician well-being and adverse patient outcomes such as low patient satisfaction [34]. The relevance of such chronic psychological burden is tremendous as it was shown that physiological responses due to stress negatively affect e.g. memory, immune system functions, the function of the cardiovascular system, and brain electric activity [35, 36].

4.1 Comparison to other ML analyses

There are a few other studies from other medical fields, which compared standard statistical and ML approaches, similar to our results. Machine learning is considered a branch of artificial intelligence, which extracts meaningful patterns from data and develops prediction models using several algorithms [37]. ML approaches integrate many different levels of data to develop a new approach to classification based on medical issues such as chronic stress and linked more precisely to interventions for a given individual. Better model accuracy by machine learning was also found in an UK study on cardiovascular risk prediction. Using routine clinical data of 378,256 patients four machine learning algorithms (random forest, logistic regression, gradient boosting, and neural network) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years [38]. Neural network performed best, with a predictive accuracy improving by 3.6% compared to baseline algorithm. Using a dataset with 9.502 heart failure patients and a one-year follow-up, a US study compared four machine learning methods (least absolute shrinkage and selection operation regression, classification and regression trees, random forests, and gradient boosted modeling (GBM)) with logistic regression as a classical statistical procedure to predict four heart failure outcomes. The C statistic results for all outcomes show that ML methods were better calibrated and that gradient-boosted (GMB) model was the most consistent ML modeling approach [39]. In the field of oncology, a large American study on breast cancer survival compared two ML algorithms (artificial neural network and decision trees) to classical statistical logistic regression using a large dataset with more than 200,000 cases. The decision tree approach was the best predictor with 93.6% prediction accuracy, followed by artificial neural network with 91.2% and LR with 89.2% [40]. Overall, machine learning approaches yielded more accurate results than classical methods in our and the above-mentioned studies.

4.2 Strength and limitations

The key strength of this study is the comparison of a range of machine learning approaches in the field of healthcare workers´ well-being. Chronic stress measurement approaches based on self-reported questionnaires [17, 41] are subjective and cannot provide immediate information about the state of a person. A continuous stress monitoring using data mining technology helps to better understand stress patterns and also provide better insights about possible future interventions.

Limitations of this study include the rather small sample size and the large number of predictor variables (features), which poses a risk for overfitting [42, 43]. One of the key components of predictive accuracy is the amount and quality of the data to provide better results. Furthermore, our data source contained practice assistants from the German region only, which limits generalizability and requires validation in populations from other countries where job tasks and challenges might be different. Although the data collection was conducted in 2014, the results still apply to German practices, except that the COVID pandemic likely increased workload and psychological burden, which we are currently evaluating in an ongoing study [11]. Prospectively, research using continuous stress monitoring and data mining technologies will help to better understand stress patterns and provide even deeper insights for possible future interventions.

5. Conclusion

Compared to logistic regression as a classical statistical procedure, this study showed that all machine learning classifiers provided more accurate models for the prediction of chronic stress in practice assistants with random forest performing best. Identification of chronic stress is of importance for the well-being and productivity of practice assistants. RF identified prominent predictor variables (features) that influence chronic stress which should be considered when developing interventions to reduce chronic stress.

Acknowledgments

We would like to thank all participating practices for their support of the study.

Data Availability

The manuscript’s data cannot be shared publicly because of ethical restrictions as our dataset includes potentially identifying information of personnel in general practices. Data requests may be sent to the institutional ethics committee of Universitatsklinikum Bonn (ethik@ukbonn.de).

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Schreibauer EC, Hippler M, Burgess S, Rieger MA, Rind E. Work-Related Psychosocial Stress in Small and Medium-Sized Enterprises: An Integrative Review. Int J Environ Res Public Health. 2020; 17. Epub 2020/10/13. 10.3390/ijerph17207446 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Dreher A, Theune M, Kersting C, Geiser F, Weltermann B. Prevalence of burnout among German general practitioners: Comparison of physicians working in solo and group practices. PLoS One. 2019; 14:e0211223. 10.1371/journal.pone.0211223 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Luken M, Sammons A. Systematic Review of Mindfulness Practice for Reducing Job Burnout. Am J Occup Ther. 2016; 70:7002250020p1–7002250020p10. 10.5014/ajot.2016.016956 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Alzoubi KH, Abdel-Hafiz L, Khabour OF, El-Elimat T, Alzubi MA, Alali FQ. Evaluation of the Effect of Hypericum triquetrifolium Turra on Memory Impairment Induced by Chronic Psychosocial Stress in Rats: Role of BDNF. Drug Des Devel Ther. 2020; 14:5299–314. Epub 2020/12/01. 10.2147/DDDT.S278153 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Datta D, Arnsten AFT. Loss of Prefrontal Cortical Higher Cognition with Uncontrollable Stress: Molecular Mechanisms, Changes with Age, and Relevance to Treatment. Brain Sci. 2019; 9. Epub 2019/05/17. 10.3390/brainsci9050113 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sanford LD, Suchecki D, Meerlo P. Stress, arousal, and sleep. Curr Top Behav Neurosci. 2015; 25:379–410. 10.1007/7854_2014_314 . [DOI] [PubMed] [Google Scholar]
  • 7.Hu Y, Visser M, Kaiser S. Perceived Stress and Sleep Quality in Midlife and Later: Controlling for Genetic and Environmental Influences. Behav Sleep Med. 2020; 18:537–49. Epub 2019/06/23. 10.1080/15402002.2019.1629443 . [DOI] [PubMed] [Google Scholar]
  • 8.Kaldewaij R, Koch SBJ, Volman I, Toni I, Roelofs K. On the Control of Social Approach-Avoidance Behavior: Neural and Endocrine Mechanisms. Curr Top Behav Neurosci. 2017; 30:275–93. 10.1007/7854_2016_446 . [DOI] [PubMed] [Google Scholar]
  • 9.Viehmann A, Kersting C, Thielmann A, Weltermann B. Prevalence of chronic stress in general practitioners and practice assistants: Personal, practice and regional characteristics. PLoS One. 2017; 12:e0176658. 10.1371/journal.pone.0176658 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hapke U, Maske UE, Scheidt-Nave C, Bode L, Schlack R, Busch MA. Chronischer Stress bei Erwachsenen in Deutschland: Ergebnisse der Studie zur Gesundheit Erwachsener in Deutschland (DEGS1). Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2013; 56:749–54. 10.1007/s00103-013-1690-9 . [DOI] [PubMed] [Google Scholar]
  • 11.Weltermann BM, Kersting C, Pieper C, Seifried-Dübon T, Dreher A, Linden K, et al. IMPROVEjob—Participatory intervention to improve job satisfaction of general practice teams: a model for structural and behavioural prevention in small and medium-sized enterprises—a study protocol of a cluster-randomised controlled trial. Trials. 2020; 21:532. 10.1186/s13063-020-04427-7 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Harris MF, Proudfoot JG, Jayasinghe UW, Holton CH, Powell Davies GP, Amoroso CL, et al. Job satisfaction of staff and the team environment in Australian general practice. Med J Aust. 2007; 186:570–3. 10.5694/j.1326-5377.2007.tb01055.x . [DOI] [PubMed] [Google Scholar]
  • 13.Lee FJ, Stewart M, Brown JB. Stress, burnout, and strategies for reducing them: what’s the situation among Canadian family physicians. Can Fam Physician. 2008; 54:234–5. [PMC free article] [PubMed] [Google Scholar]
  • 14.Jaccard J. Interaction effects in factorial analysis of variance. Thousand Oaks, Calif.: Sage; 2005. [Google Scholar]
  • 15.Obermeyer Z, Emanuel EJ. Predicting the Future—Big Data, Machine Learning, and Clinical Medicine. N Engl J Med. 2016; 375:1216–9. 10.1056/NEJMp1606181 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Weng W-H. Machine Learning for Clinical Predictive Analytics. In: Celi LA, Majumder MS, Ordóñez P, Osorio JS, Paik KE, et al., editors. LEVERAGING BIG DATA IN GLOBAL HEALTH. [S.l.]: SPRINGER NATURE; 2020. pp. 199–217. [Google Scholar]
  • 17.Petrowski K, Paul S, Albani C, Brähler E. Factor structure and psychometric properties of the trier inventory for chronic stress (TICS) in a representative German sample. BMC Med Res Methodol. 2012; 12:42. 10.1186/1471-2288-12-42 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Schulz P, Schlotz W. Trierer Inventar zur Erfassung von chronischem Streß (TICS): Skalenkonstruktion, teststatistische Überprüfung und Validierung der Skala Arbeitsüberlastung. Diagnostica. 1999; 45:8–19. 10.1026//0012-1924.45.1.8 [DOI] [Google Scholar]
  • 19.Prümper J., Hartmannsgruber K. & Frese, 1995. (M.). KFZA–Kurzfragebogen zur Arbeitsanalyse. Available from: https://fragebogen-arbeitsanalyse.at/help. [Google Scholar]
  • 20.Poulos J, Valle R. Missing Data Imputation for Supervised Learning. Applied Artificial Intelligence. 2018; 32:186–96. 10.1080/08839514.2018.1448143 [DOI] [Google Scholar]
  • 21.Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection.; 1995. [Google Scholar]
  • 22.Jiang G, Wang W. Error estimation based on variance analysis of k -fold cross-validation. Pattern Recognition. 2017; 69:94–106. 10.1016/j.patcog.2017.03.025 [DOI] [Google Scholar]
  • 23.Steyerberg EW. Validation in prediction research: the waste by data splitting. J Clin Epidemiol. 2018; 103:131–3. Epub 2018/07/29. 10.1016/j.jclinepi.2018.07.010 . [DOI] [PubMed] [Google Scholar]
  • 24.Hilbe JM. Logistic regression models. Boca Raton, London, New York: CRC Press; 2017. [Google Scholar]
  • 25.Kuhn M, Johnson K. Applied predictive modeling. 5th ed. New York: Springer; 2016. [Google Scholar]
  • 26.Boehmke BC, Greenwell B. Hands-on machine learning with R. Boca Raton, London, New York: CRC Press; 2020. [Google Scholar]
  • 27.Breiman L. Random Forests. Machine Learning. 2001; 45:5–32. 10.1023/A:1010933404324 [DOI] [Google Scholar]
  • 28.Denisko D, Hoffman MM. Classification and interaction in random forests. Proc Natl Acad Sci U S A. 2018; 115:1690–2. Epub 2018/02/12. 10.1073/pnas.1800256115 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Probst P, Wright MN, Boulesteix A-L. Hyperparameters and tuning strategies for random forest. WIREs Data Mining Knowl Discov. 2019; 9. 10.1002/widm.1301 [DOI] [Google Scholar]
  • 30.Smith LN. A disciplined approach to neural network hyper-parameters: Part 1—learning rate, batch size, momentum, and weight decay. US Naval Research Laboratory Technical Report. 2018. Available from: https://arxiv.org/pdf/1803.09820. [Google Scholar]
  • 31.Newson R. Confidence Intervals for Rank Statistics: Somers’ D and Extensions. The Stata Journal. 2006; 6:309–34. 10.1177/1536867X0600600302 [DOI] [Google Scholar]
  • 32.Lee FJ, Brown JB, Stewart M. Exploring family physician stress: helpful strategies. Can Fam Physician. 2009; 55:288–289.e6. Available from: https://www.cfp.ca/content/55/3/288.short. [PMC free article] [PubMed] [Google Scholar]
  • 33.Hoffmann J, Kersting C, Weltermann B. Practice assistants´ perceived mental workload: A cross-sectional study with 550 German participants addressing work content, stressors, resources, and organizational structure. PLoS One. 2020; 15:e0240052. 10.1371/journal.pone.0240052 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Shanafelt TD, West C, Zhao X, Novotny P, Kolars J, Habermann T, et al. Relationship between increased personal well-being and enhanced empathy among internal medicine residents. J Gen Intern Med. 2005; 20:559–64. 10.1111/j.1525-1497.2005.0108.x . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Yaribeygi H, Panahi Y, Sahraei H, Johnston TP, Sahebkar A. The impact of stress on body function: A review. EXCLI J. 2017; 16:1057–72. 10.17179/excli2017-480 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Lotfan S, Shahyad S, Khosrowabadi R, Mohammadi A, Hatef B. Support vector machine classification of brain states exposed to social stress test using EEG-based brain network measures. Biocybernetics and Biomedical Engineering. 2019; 39:199–213. 10.1016/j.bbe.2018.10.008 [DOI] [Google Scholar]
  • 37.Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. Journal of Biomedical Informatics. 2002; 35:352–9. 10.1016/s1532-0464(03)00034-0 [DOI] [PubMed] [Google Scholar]
  • 38.Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data. PLoS One. 2017; 12:e0174944. 10.1371/journal.pone.0174944 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Desai RJ, Wang SV, Vaduganathan M, Evers T, Schneeweiss S. Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes. JAMA Netw Open. 2020; 3:e1918962. 10.1001/jamanetworkopen.2019.18962 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Delen D, Walker G, Kadam A. Predicting breast cancer survivability: a comparison of three data mining methods. Artificial intelligence in medicine. 2005; 34:113–27. 10.1016/j.artmed.2004.07.002 . [DOI] [PubMed] [Google Scholar]
  • 41.Slavich GM, Shields GS. Assessing Lifetime Stress Exposure Using the Stress and Adversity Inventory for Adults (Adult STRAIN): An Overview and Initial Validation. Psychosom Med. 2018; 80:17–27. 10.1097/PSY.0000000000000534 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Jovic A, Brkic K, Bogunovic N. A review of feature selection methods with applications. 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE; 5/25/2015–5/29/2015. pp. 1200–5. [Google Scholar]
  • 43.Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS One. 2019; 14:e0224365. Epub 2019/11/07. 10.1371/journal.pone.0224365 . [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Alfredo Vellido

23 Dec 2020

PONE-D-20-23593

Chronic stress in practice assistants: an Analytic approach comparing four machine learning classifiers with a standard logistic regression model

PLOS ONE

Dear Dr. Bozorgmehr,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Feb 06 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Alfredo Vellido

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please clarify in your Methods section how the dataset was obtained by the authors of this study, and whether there was any ethical oversight over the data collection for this study. Please state whether or not the authors had access to any identifying information.

3.We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Dear Authors

the manuscript entitled"Chronic stress in practice assistants: an Analytic approach comparing four machine

learning classifiers with a standard logistic regression model" has several interesting and strategic findings and i certainly used it in the real life. the chronic stress is very important issue in our society for health. thank you for your study.

some minor revision is only needed. although the manuscript is too long and the part of method was explained in the details but i do not recommend to summarize because it help some readers who not know intelligent artificial method.

another limitation that you mentioned is that certainly the important features of work in the chronic stress is dependent to several factor of social-psychological-economy- culture. then these factors were important in that community or culture. however the results can give us a closer look at reality. i suggest that the researchers analyse again the classifiers only in the women subjects and clear the men. i think the accuracy of results increases.

Abstract: the background of abstract should be brief and general and the explanation of study add to the method. the names of questionnaires enter in the method.

introduction: change the phrase of chronic strain to chronic stress. strain is force that could cause stress and stress is mental state and used as impairment.

results: the table 6: the stress is add to this topic:Total Non-(stress) Cases =

if possible add the weight to main features listed in the table 7

discussion: as a strategic note you add the some complications due to chronic stress for general health or some studies that measure the effect of stress on brain function or other parts

The impact of stress on body function: A review

H Yaribeygi, Y Panahi, H Sahraei, TP Johnston, A Sahebkar

EXCLI journal 16, 1057

Support vector machine classification of brain states exposed to social stress test using EEG-based brain network measures

S Lotfan, S Shahyad, R Khosrowabadi, A Mohammadi, B Hatef

Biocybernetics and Biomedical Engineering 39 (1), 199-213

Reviewer #2: Dear authors, your manuscript is interesting but I need you to answer some questions:

INTRODUCTION

- The introduction is very short. The constructs and concepts necessary to understand the manuscript are not explained.

- Page 4, paragraph 1, lines 47-50: this information should go in the “Method” section.

METHODS

Data source:

- What categories are there among the "general practice personnel"? The authors must describe the sample used.

- What was the target population? How was the sample chosen? The authors must specify it.

DISCUSSION

Page 21, lines 272-297: The first two paragraphs do not contribute anything new and repeat information about the results.

Limitations

The information is from 2014. There was a global economic crisis that affected working conditions. It should be said since currently, working conditions are not equivalent.

REFERENCES

Many bibliographies are obsolete and some citations are incomplete. The bibliographic citations used are more than 5 years old (57,1%). The authors must update and arrange the bibliography.

Too many references do not meet the journal guidelines and that have errors. The authors should review this section.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Boshra Hatef

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 May 4;16(5):e0250842. doi: 10.1371/journal.pone.0250842.r002

Author response to Decision Letter 0


16 Feb 2021

Response to Editor

PONE-D-20-23593

Chronic stress in practice assistants: an Analytic approach comparing four machine learning classifiers with a standard logistic regression model

PLOS ONE

Dear Dr. Vellido,

We like to thank you and the reviewers for the very helpful suggestions.

Please find enclosed our revision and answers to the open items.

Best regards,

Arezoo Bozorgmehr

Editor comments:

1. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future.

Reply: Not applicable.

2. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

Reply: Done.

3. Please clarify in your Methods section how the dataset was obtained by the authors of this study, and whether there was any ethical oversight over the data collection for this study. Please state whether or not the authors had access to any identifying information.

Reply: Thank you, we clarified this. The information regarding to the ethics statement was already included (please see P.7, lines 130-134). We now highlighted this by inserting in a new headline and added requested aspects.

Text: P. 7, lines 131-136 (document of revised Manuscript with track changes):

“2.2 Ethics statement:

Ethical approval for survey had been obtained from the Ethics Committee of the Medical Faculty of the University of Duisburg-Essen (reference number: 13-5536-BO, date of approval: 24/11/2014). All participants had received written information and signed informed consent forms. The principal investigator of the study (B.W) and coauthor of this manuscript provided the data for this analysis.”

4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

Reply: The manuscript’s data cannot be shared publicly because of ethical restrictions as our dataset includes potentially identifying information of personnel in general practices. Data requests may be sent to the institutional ethics committee (ethik@ukbonn.de).

5. In your revised cover letter, please address the following prompts:

a. If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

Reply: The manuscript’s data cannot be shared publicly because of ethical restrictions as our dataset includes potentially identifying information of personnel in general practices. Data requests may be sent to the institutional ethics committee (ethik@ukbonn.de).

Reviewer reports:

Reviewer #1:

1. although the manuscript is too long and the part of method was explained in the details but i do not recommend to summarize because it help some readers who not know intelligent artificial method.

Reply: Thank you.

2. another limitation that you mentioned is that certainly the important features of work in the chronic stress is dependent to several factor of social-psychological-economy- culture. then these factors were important in that community or culture. however the results can give us a closer look at reality. i suggest that the researchers analyse again the classifiers only in the women subjects and clear the men. i think the accuracy of results increases.

Reply: Thank you for this suggestion. We fully agree and had used only the dataset with the women for our analyses. This is outlined in the manuscript.

Revised text: Please see section 3.2, line 254-257 (document of revised Manuscript with track changes).

“No gender-specific distribution was applied, because PrAs were predominantly female (98.9%). All regression and machine learning approaches were applied to the dataset with female subjects only (n=546).”

3. Abstract: the background of abstract should be brief and general and the explanation of study add to the method. the names of questionnaires enter in the method.

Reply: Thank you for your suggestion, we changed this.

Revised text: Please see abstract (document of revised Manuscript with track changes):

“Background

Occupational stress is associated with adverse outcomes for medical professionals and patients. In our cross-sectional study with 136 general practices, 26.4% of 550 practice assistants showed high chronic stress. As machine learning strategies offer the opportunity to improve understanding of chronic stress by exploiting complex interactions between variables, we used data from our previous study to derive the best analytic model for chronic stress: four common machine learning (ML) approaches are compared to a classical statistical procedure.”

“Methods

We applied four machine learning classifiers (random forest, support vector machine, K-nearest neighbors’, and artificial neural network) and logistic regression as standard approach to analyze factors contributing to chronic stress in practice assistants. Chronic stress had been measured by the standardized, self-administered TICS-SSCS questionnaire. The performance of these models was compared in terms of predictive accuracy based on the ‘operating area under the curve’ (AUC), sensitivity, and positive predictive value.”

4. Introduction: change the phrase of chronic strain to chronic stress. strain is force that could cause stress and stress is mental state and used as impairment.

Reply: Done

Revised text: Please see page 5, line 104 (document of revised Manuscript with track changes): “… associated with chronic stress in practice assistants …”

5. results: the table 6: the stress is add to this topic:Total Non-(stress) Cases =

Reply: Thank you for your hint, we corrected this.

Revised text: Please see table 6.

6. if possible add the weight to main features listed in the table 7

Reply: We added this information as suggested, thank you.

Revised text: Please see table 7.

7. discussion: as a strategic note you add the some complications due to chronic stress for general health or some studies that measure the effect of stress on brain function or other parts

The impact of stress on body function: A review

H Yaribeygi, Y Panahi, H Sahraei, TP Johnston, A Sahebkar

EXCLI journal 16, 1057

Support vector machine classification of brain states exposed to social stress test using EEG-based brain network measures

S Lotfan, S Shahyad, R Khosrowabadi, A Mohammadi, B Hatef

Biocybernetics and Biomedical Engineering 39 (1), 199-213

Reply: Thank you for these interesting articles, which we added to our paper.

Revised text: Please see references on page 22, lines 352-354 (document of revised Manuscript with track changes).

“The relevance of such chronic psychological burden is tremendous as it was shown that physiological responses due to stress negatively affect e.g. memory, immune system functions, the function of the cardiovascular system, and brain electric activity [35,36].”

Reviewer #2:

8. INTRODUCTION

a) The introduction is very short. The constructs and concepts necessary to understand the manuscript are not explained.

Reply: Thank you, we revised the text profoundly. First, we refer to the construct of stress as developed by Selye and Lazarus. Second, we outline the construct of practices being multi-parameter systems, which affect professionals working there. Third, we outlined the concept of machine learning as analytic strategy more in detail.

Revised text: Please see pages 4-6, lines 59-115 (document of revised Manuscript with track changes):

“Occupational stress is an important issue in health care and other workers worldwide [1]. Following stress models introduced by Selye, Lazarus and others, it was shown that chronic stress can lead to adverse (mental) health effects such as burnout or depression [2,3]. Also, stress can produce temporary or even permanent alterations in memory [4], cognition [5], arousal/sleep [6,7], and coping behaviours [8]. In our prior study with 214 general practitioners (GPs) and 550 practice assistants from 136 German general practices, we showed that 19.9% of the male GPs (n = 141), 35.6% of the female GPs (n = 73) and 26.4% of the practice assistants (PrAs) had high chronic stress [9]. Overall, the mean prevalence of high chronic stress was 26.3% in this workforce, which is more than twice as prevalent compared to the general population (11%) studied in the representative German Health Interview and Examination Survey for Adults (DEGS1) with more than 7.900 participants [10,11]. Analyzing for various work and (regional) practice characteristics, we showed that only the weekly working hours correlated with high chronic stress in GPs and PrAs.

However, aiming to develop effective prevention strategies, a more profound understanding of factors causing and/or contributing to high psychological strain on an individual and group level is needed. As workplaces typically are complex and multifactorial social organizations, appropriate statistical methods are needed to analyse for complex associations and cause-effect relationships. Prior studies addressing impaired psychological well-being in primary care workers used standard statistical procedures such as prevalence ratios and logistic regression models to evaluate for associations [9,12,13]. These statistical approaches usually simplify the complex relationships between independent variables (features) and response variable (dependent variable): they assume that each independent variable is linked to the outcome by a linear statistical function. This is especially problematic when datasets with large numbers of non-linear interactions and interaction effects between independent variables occur, which make the model more complex [14]. Nowadays, machine learning (ML) approaches offer new opportunities to evaluate complex relationships. Conceptually, ML has the benefit that it efficiently exploits complex and non-linear interactions between variables by minimizing the error between predicted and observed response variables and improve the accuracy of the models compared to standard approaches [15,16]. By using a large dataset available on practice assistants from our prior study, we aim to develop better understanding workplace factors, associated with chronic stress in practice assistants using machine learning. Thus, we compare four machine learning classifiers (random forest, support vector machine, K-nearest neighbors’, artificial neural network) with a standard logistic regression model using standard measurements to compare test accuracy, i.e. to derive the best prediction model for chronic stress in practice assistants in primary care.

Regarding terminology, we like to point out that we use the term “prediction” as used in the context of machine learning: it refers to the output of an algorithm after it has been trained on a dataset and applied to new data to forecast the likelihood of a particular outcome. In contrast, in epidemiological analyses, a (risk) prediction model refers to a mathematical equation that uses patient characteristics (risk factors) to estimate the probability of a defined outcome prospectively.”

b) Page 4, paragraph 1, lines 47-50: this information should go in the “Method” section.

Reply: Thank you for your advice. We now clarified that these results stem from our previous publication on chronic stress in GPs and practice assistants in the introduction. In addition, we included this information in the methods section.

Revised text: Please see page 7, lines 127-130 (document of revised Manuscript with track changes).

“We documented that 26.4% of the 550 practice assistants (PrAs) had high chronic stress, as well as 19.9% of the male (n = 141) and 35.6% of the female (n = 73) general practitioners (GPs) [9]. In this workforce, the average of workers with high chronic stress was 26.3% (n = 201).”

9. METHODS

Data source:

a) What categories are there among the "general practice personnel"? The authors must describe the sample used.

Reply: Thank you, we added the information in the introduction and methods section.

Revised text: Please see Methods section, lines 119-127 (document of revised Manuscript with track changes):

“The dataset used for the analyses was derived from our cross-sectional study addressing stress among general practice personnel (GPs, PrAs), which was performed among general practices belonging to the teaching practice network of the Institute for General Medicine, University Hospital Essen, Essen, Germany. A total of 764 professionals from 136 practices had taken part in the survey, which was performed in 2014. The design of the study and key results addressing the 214 GPs (practice owners and employed physicians) and 550 practice assistants (PrAs) (including medical secretaries and practice assistants in trainees) are published [9]. This analysis addresses chronic stress in 550 practice assistants (PrAs), which are the largest professional group in general practices.”

b) What was the target population? How was the sample chosen? The authors must specify it.

Reply: Thank you. We clarified this in the introduction and methods section. The target populations were 550 practice assistants (PrAs) from 136 teaching practice network.

Revised text: Please see page 7, lines 119-127 (document of revised Manuscript with track changes).

“The dataset used for the analyses was derived from our cross-sectional study addressing stress among general practice personnel (GPs, PrAs), which was performed among general practices belonging to the teaching practice network of the Institute for General Medicine, University Hospital Essen, Essen, Germany. A total of 764 professionals from 136 practices had taken part in the survey, which was performed in 2014. The design of the study and key results addressing the 214 GPs (practice owners and employed physicians) and 550 practice assistants (PrAs) (including medical secretaries and practice assistants in trainees) are published [9]. This analysis addresses chronic stress in 550 practice assistants (PrAs), which are the largest professional group in general practices.”

10. DISCUSSION

a) Page 21, lines 272-297: The first two paragraphs do not contribute anything new and repeat information about the results.

Reply: We fully agree, thank you for pointing this out, we revised the text.

Revised text: Please see the pages 21-22, lines 311-354 (document of revised Manuscript with track changes).

“To the best of our knowledge, this study is the first to use machine learning for a better understanding of stress in primary care practice personnel. Comparing four common machine learning (ML) approaches to a classical statistical procedure, we showed that all four machine learning approaches provided more accurate models for the prediction of strain due to chronic stress than as standard regression analysis. Random forest showed the highest accuracy with workload, high demand to concentrate, and time pressure being the most important factors associated with chronic stress. These factors were also identified in other studies in the target populations GPs and GP practice personnel. Addressing job satisfaction, Harris et al. identified time pressure as the most frequent stressor in a study with 626 Australian practice staff in 96 general practices [12]. Studying 158 Canadian family physicians, Lee et al. determined the following occupational stressors as relevant: challenging patients, high workload, time limitations, competency issues, challenges of documentation and practice management and changing roles within the workplace [13,32]. Similarly, Hoffmann et al. showed that the work disruption was a negative relevant workplace factor in study with 550 practice assistants [33]. These stressors are described to influence poor physician well-being and adverse patient outcomes such as low patient satisfaction [34]. The relevance of such chronic psychological burden is tremendous as it was shown that physiological responses due to stress negatively affect e.g. memory, immune system functions, the function of the cardiovascular system, and brain electric activity [35,36].”

11. Limitations

a) The information is from 2014. There was a global economic crisis that affected working conditions. It should be said since currently, working conditions are not equivalent.

Reply: The working conditions in German general practices did not change during the last years (except during the current pandemic). Workplaces are secure, there are no insolvencies of practices, and the income of practices is a mixture of reimbursement by the statutory health insurances and private patients. The migration influx in 2015 led to more patients in the system, but for each practice these were small numbers. Also, the gross national product remained stable for Germany (https://en.wikipedia.org/wiki/Gross_national_income).

Revised text: Please see page 24, lines 399-401 (document of revised Manuscript with track changes).

“Although the data collection was conducted in 2014, the results still apply to German practices, except that the COVID pandemic likely increased workload and psychological burden, which we are currently evaluating in an ongoing study [11].”

12. REFERENCES

a) Many bibliographies are obsolete and some citations are incomplete. The bibliographic citations used are more than 5 years old (57,1%). The authors must update and arrange the bibliography.

Too many references do not meet the journal guidelines and that have errors. The authors should review this section.

Reply: Thank you, we reviewed the literature again and updated references. On the other hand, we continue to refer to important studies in the field even if they are older than 5 years. Now only 32.5% of the quotations are older than 5 years.

Revised text: Please see section references.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Alfredo Vellido

15 Apr 2021

Chronic stress in practice assistants: an Analytic approach comparing four machine learning classifiers with a standard logistic regression model

PONE-D-20-23593R1

Dear Dr. Bozorgmehr,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Alfredo Vellido

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Dear authors,

Thanks for your reply. The explanations of the authors are satisfactory. The paper has greatly improved its quality.

Congratulations on your work.

Best regards

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Acceptance letter

Alfredo Vellido

19 Apr 2021

PONE-D-20-23593R1

Chronic stress in practice assistants: an Analytic approach comparing four machine learning classifiers with a standard logistic regression model

Dear Dr. Bozorgmehr:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Alfredo Vellido

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The manuscript’s data cannot be shared publicly because of ethical restrictions as our dataset includes potentially identifying information of personnel in general practices. Data requests may be sent to the institutional ethics committee of Universitatsklinikum Bonn (ethik@ukbonn.de).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES