Prediction of generalized anxiety levels during the Covid-19 pandemic: A machine learning-based modeling approach

Faisal Mashel Albagmi; Aisha Alansari; Deema Saad Al Shawan; Heba Yaagoub AlNujaidi; Sunday O Olatunji

doi:10.1016/j.imu.2022.100854

. 2022 Jan 19;28:100854. doi: 10.1016/j.imu.2022.100854

Prediction of generalized anxiety levels during the Covid-19 pandemic: A machine learning-based modeling approach

Faisal Mashel Albagmi ^a,^∗, Aisha Alansari ^b, Deema Saad Al Shawan ^c, Heba Yaagoub AlNujaidi ^c, Sunday O Olatunji ^b

PMCID: PMC8766246 PMID: 35071730

Abstract

The rapid spread of the Covid-19 outbreak led many countries to enforce precautionary measures such as complete lockdowns. These lifestyle-altering measures caused a significant increase in anxiety levels globally. For that reason, decision-makers are in dire need of methods to prevent potential public mental crises. Machine learning has shown its effectiveness in the early prediction of several diseases. Therefore, this study aims to classify two-class and three-class anxiety problems early by utilizing a dataset collected during the Covid-19 pandemic in Saudi Arabia. The data was collected from 3017 participants from all regions of the Kingdom via an online survey containing questions to identify factors influencing anxiety levels, followed by questions from the GAD-7, a screening tool for Generalized Anxiety Disorders. The prediction models were built using the Support Vector Machine classifier for its robust outcomes in medical-related data and the J48 Decision Tree for its interpretability and comprehensibility. Experimental results demonstrated promising results for the early classification of two-class and three-class anxiety problems. As for comparing Support Vector Machine and J48, the Support Vector Machine classifier outperformed the J48 Decision Tree by attaining a classification accuracy of 100%, precision of 1.0, recall of 1.0, and f-measure of 1.0 using 10 features.

Keywords: COVID-19, Pandemic, Anxiety, Machine learning, Saudi Arabia

1. Introduction

One of the shared global experiences of the COVID-19 pandemic is the experience of “lockdown.” However, such strict restrictions vary from country to country and change over time. The consequences of lockdowns on mental health have been substantial [[1], [2], [3]]. Lockdown conditions lead to social isolation and confinement, which can impact the population's mental health. Furthermore, this crisis can have a broader impact on education, work, everyday life, and implications for mental health services. The Lancet psychiatry has highlighted the need for mental health services during lockdowns, especially for the most vulnerable groups such as students, medical professionals, and women [4]. As precautionary measures impact large portions of the population, it is expected that mental health problems will be on the increase globally [1,5,6]. According to the Anxiety and Depression Association of America, the commonest mental health problem in the United States was Generalized Anxiety Disorder (GAD). Approximately one-third of the American population suffers from GAD, but only less than half of them have access to mental healthcare [7].

Despite the extensive research, the magnitude and the underlying factors of GAD during lockdown are unknown. However, there has been evidence to suggest that early identification and access to mental health treatment may help mitigate the impact of mental health, especially GAD. Using technology such as telepsychiatry for frontline medical workers and vulnerable populations during lockdown times could alleviate the effects of mental health in the population affected by the COVID-19 pandemic [2]. Nevertheless, research could guide mental health services such as telepsychiatry to target the most vulnerable population groups, especially during pandemic lockdowns [3]. For that reason, machine learning can be a powerful tool to enable decision makers to customize mental health services depending on the predicted needs of different of these subpopulations.

The early preparation for the potential mental health needs is crucial to prevent a mental health crisis. According to Thompson and his colleagues investigated the delay in seeking treatment for anxiety and mood disorders. They found that people delay seeking help for around 8.2 years. Moreover, they reported two main indicators associated with this delay slower problem recognition and younger age at onset. As older people take a longer time to contact initial treatment. This could be effectively prevented by early prediction of anxiety using machine learning models [8].

Several studies aimed to assess the psychological impact of the pandemic on the Saudi population, which enforced a complete lockdown in March 2020 [9]. However, most of these studies lack modeling of the collective effects of GAD on the population in a pandemic. This study addresses this gap by using supervised machine learning algorithms, which is an explainable artificial intelligence approach to capture the joint multivariant distribution underlying extensive survey data collected across Saudi Arabia during a lockdown. The choice of machine learning algorithm selected paid succinct attention to models that have proved their success in various medical applications, namely, Support Vector Machine (SVM) and J48 Decision Tree (DT). Support Vector Machine learning is a well-established algorithm for both classification and regression with extensive successful applications in several fields, including medical and biomedical applications, while decision tree is well known for its clarity and easier understanding even to non-computer professionals, thereby making it appealing to the medical and public health professionals coupled with its excellent performance in various applications. Empirical results showed that SVM outperformed the J48 Decision Tree using the ten highest correlated features and the optimized hyperparameters, achieving a 100% accuracy in anxiety binary [10]. As for comparing Support Vector Machine and J48, the Support Vector Machine classifier outperformed the J48 Decision Tree by attaining a classification accuracy of 100%, precision of 1.0, recall of 1.0, and f-measure of 1.0 using 10 features. Although the J48 decision tree achieved lesser performance measures, the highest being 95% accuracy, nonetheless, it offers the possibility of having a better explain-ability to non-computer professionals in understanding how the developed models worked. In fact, the potentials of the proposed machine learning models in mitigating the late effect of anxiety cannot be overemphasized.

2. Review of related literature

The public health mental crisis during COVID-19 has been studied by several researchers worldwide. According to a study conducted in China in 2020, one-third of the participants reported moderate-to-severe anxiety, and more than half of the participants had a moderate-to-severe psychological impact [11].

There are several studies that aim to assess the psychological impact of the pandemic on the Saudi Population. For instance, Albagmi and his colleagues assessed the prevalence of anxiety and associated factors during the lockdown period at the peak of the outbreak in Saudi Arabia. A total of 3,017 respondents from all five main regions of Saudi Arabia completed the survey. The results indicated that 19.6% of the respondents had a moderate to severe level of anxiety during the COVID-19 pandemic [9]. The factors that were associated with a higher level of anxiety included being female, being a student, being single or divorced, and living with a family member who is vulnerable to COVID-19.

Similarly, another study conducted in Saudi Arabia measured the impact of the pandemic on the psychological disposition of a total of 2081 Saudi residents and citizens. According to the results, 7.3% of the respondents had anxiety. Additionally, the researchers concluded that individuals are more likely to develop depression during the pandemic included non-Saudi, divorcees, the elderly, and university students. As for factors that correlated with a higher level of anxiety, they included “Saudi individuals, married people, the unemployed and those with a high income” [12]. Moreover, another study investigated the anxiety level across students in Saudi Arabia during the COVID-19 pandemic. The study revealed that 35% of students experienced moderate to severe anxiety. Female and fourth-year students were more anxious compared to their counterparts [13].

In recent years, there has been an increasing interest in using machine learning models in predicting anxiety disorders. These prediction models are appealing to decision-makers due to their ability to detect the potential outcomes of different courses of action. These tools are handy to assess the potential impact of public mental health crises and understand their associated factors' dynamics. Pintelas and colleagues., [14] conducted a systematic review of machine learning prediction methods for anxiety disorders. They concluded that the accuracy of these research relay on the type of prediction methods and data acquisitions as clinical data or self or screening tools. Out of the 16 studies examined, they found that the highly used method for predicting post-traumatic stress disorder (PTSD) and Seasonal affective disorder (SAD) were Hybrid methods and Support Vector Machine (SVM), respectively. Also, Artificial neural networks (ANNs) and ensemble methods achieve the highest prediction scores.

Boeke and his colleagues used neuroimaging measurements to predict traits of anxiety using a k-fold cross validation machine across 531, 307 women. They conclude that they did not find evidence of a generalizable anxiety biomarker using different method [15]. Other studies have also predicted GAD among women using data acquired from a self-screening survey. Husain et al. [16] found that the random forest approach showed high prediction accuracy (0.9). This was also investigated by Jothi et al., in 2021 as they used Shapley value as a feature selection to predict GAD among women in Malaysia.

Elhai et al. collected Cross-sectional data from 908 adults from Eastern China. The questionnaire was distributed between 24 February to 15 March 2020, when strict social distancing measures were in place [17]. The authors adopted several instruments to measure the Generalized Anxiety Disorder and other mental illnesses. These tools include the GAD-7, The Depression Anxiety Stress Scale-21 (DASS-21), and the Ruminative Responses Scale. Additionally, the participants were queried and the magnitude of their exposure to pandemic-related news. Furthermore, the researchers utilized multiple machine learning algorithms to customize their model to identify vulnerability factors for COVID-10–influenced anxiety and the perceived threat of death. The study's findings identified several predictors of anxiety severity such as stress, rumination, the threat of death from COVID-19, age, negative consequences of illness, news exposure to coronavirus, and the participant's sex.

Thompson and his colleagues investigated the delay in seeking treatment for anxiety and mood disorders. They found that people delay seeking help for around 8.2 years. Moreover, they reported two main indicators associated with this delay, slower problem recognition and younger age at onset, as older people take a longer time to contact initial treatment, which could be effectively prevented by early prediction of anxiety using machine learning models [8]. Additional studies, methods, results and limitations are described in Table 1 .

Table 1.

Overview of the reviewed sources arranged by data of publication.

Author(s) Citation	Title of article or chapter	Objective	Method	Findings	Limitations
[17]	Modeling anxiety and fear of COVID-19 using machine learning in a sample of Chinese adults: associations with psychopathology, sociodemographic, and exposure variables	To examined vulnerability factors associated with increased anxiety and fear.	The researchers used R caret package for machine learning, with packages for specific algorithms of glmnet (lasso, ridge, and elastic net regression), rf (random forest), xgbTree (extreme gradient boosted regression), and svmRadial (support vector machine with a radial basis function kernel).	Stress and rumination were the most relevant variables in modeling COVID-19-related anxiety intensity, according to shrinkage machine learning methods. The most powerful predictor of perceived COVID-19 death threat was health anxiety.	Data was from one geographical area china. They only included self-report measures of psychopathology
[18]	Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence	To identify important predictors for GAD and MDD risk using artificial intelligent	A novel machine learning process was used to re-analyze data from an observational study to tackle the problem of predicting MDD and GAD. The pipeline is an algorithmically diverse collection of machines learning approaches, including deep learning.	Being comfortable with living conditions and having public health insurance were the two most important factors in predicting MDD. Up-to-date vaccinations and marijuana usage were the two most powerful predictors of GAD. Our findings show that machine learning algorithms for detecting GAD and MDD based on EHR data have a moderate predictive performance.	The original screening for MDD and GAD outcomes may not have identified all cases in the community. The research originates from French college students, who are likely to have different baselines than other psychiatric populations.
[19]	Predicting generalized anxiety disorder among women using Shapley value	To predict GAD among women using Shapley value	On the mental health data set, the Shapley value was used as the feature selection for the data mining classifier.	The finding has been improved using feature selection among the prediction's models (Naïve Bayes, Random Forest and J48).	Small sample size 180 participants
[15]	Toward Robust Anxiety Biomarkers: A Machine Learning Approach in a Large-Scale Sample.	To predict trait anxiety from neuroimaging measurements in humans.	They compared a suite of neuroimaging-based machine learning models using Python to predict anxiety within a discovery sample (n = 531, 307 women) via k-fold cross-validation. The final model using (a stacked model incorporating region-to-region functional connectivity, amygdala seed-to-voxel connectivity, and volumetric and cortical thickness data) in a held-out, unseen test sample (n = 348, 209 women).	Stacked model was able to predict anxiety within the discovery sample. But failed to test the generalizability in the holdout sample.	The researchers studied a limited set of brain phenotypes and applied a circumscribed set of approaches. They didn't analyze a clinical sample. The imaging sequences used lack the spatial and temporal precision of current approaches
[20]	Assessment of Anxiety, Depression and Stress using Machine Learning Models	To predict anxiety, depression, and stress using 8 algorithms.	Using data from the online DASS42 tool, eight machine learning algorithms were used to predict the occurrence of psychological issues such as anxiety, depression, and stress.	The prediction accuracy obtained by utilizing the hybrid algorithm was higher than that obtained by using single methods, although the radial basis function network, which falls within the category of neural networks, yielded the highest accuracy.	NA
[6]	Learning the Mental Health Impact of COVID-19 in the United States with Explainable Artificial Intelligence	To focus on learning a ranked list of factors that could indicate a predisposition to a mental disorder during the COVID-19 pandemic.	They surveyed 17,764 adults in the United States using Bayesian network inference, they have identified key factors affecting mental health during the COVID-19 pandemic.	They discovered that patients with a chronic mental disease were more susceptible to mental problems during the COVID-19 pandemic using the Bayesian network model.	The data analyzed is limited to one geographical area (united stated)
[21]	Screening of anxiety and depression among seafarers using machine learning technology	To compare performance of different machine learning algorithms for screening of anxiety and depression among the seafarers.	After obtaining the required approval and ethical clearance, 470 sailors were interviewed at the Haldia Dock Complex in India.Five machine learning classifiers i.e., CatBoost, Logistic Regression, Naïve Bayes, Random Forest, and Support Vector Machine, were evaluated using the Python programming language.	They found that Catboost appeared to be the best one for predicting anxiety and depression with accuracy and precision 82.6% and 84.1% respectively.	The study emphasized the application of machine learning technology in the field of automated screening for mental health illness.
[22]	Detecting anxiety on Reddit	To detect anxiety related posts from Reddit using various linguistic features.	study anxiety disorders through personal narratives collected through the popular social media website, Reddit	apply N-gram language modeling, vector embeddings, topic analysis, and emotional norms to generate features that accurately classify posts related to binary levels of anxiety.	They achieve an accuracy of 91% with vectorspace word embeddings, and an accuracy of 98% when combined with lexiconbased features.

Variable	Label
Q3	Nationality
Q18	Gender
Q19	Age
Q20	Marital status
Q21	How many people are in the house? (Includes house workers and drivers)
Q22	Are you or any of your household members at increased risk of contracting the coronavirus? (This includes anyone over the age of 60 or pregnant or having comorbidities)
Q24A1	Have you been tested positive for COVID-19 test?
Q24A2	Have you been suspected of carrying the coronavirus?
Q24A3	Have any member of your family have been diagnosed with coronavirus?
Q25	Qualification
Q26	Occupation
Q28	What is the method followed by your employer, or academic institution during the pandemic? (Online or in person)
Q30	Feeling nervous, anxious, or on edge
Q31	Not being able to stop or control worrying
Q32	Worrying too much about different things
Q33	Trouble relaxing
Q34	Being so restless that it's hard to sit still
Q35	Becoming easily annoyed or irritable
Q36	Feeling afraid as if something awful might happen
Q37	How difficult have these problems made it to do work, take care of things at home, or get along with other people
Georgian	Geographically region
Anxiety (Two category)	Anxiety two categories (Anxious and non-anxious)
Anxiety (Three category)	Anxiety score three categories (Mild-Moderate-Severe)

Attributes	Mean	Median	Standard Deviation	Max.	Min.
Q3	1.063	1	0.242	2	1
Q18	1.560	2	0.496	2	1
Q19	3.307	3	1.300	6	1
Q20	1.731	2	0.560	4	1
Q21	6.733	7	3.026	30	0
Q20	1.651	2	0.477	2	1
Q24A1	0.002	0	0.048	1	0
Q24A2	0.006	0	0.075	1	0
Q24A3	0.010	0	0.099	1	0
Q25	3.731	4	0.954	5	1
Q26	2.730	2	1.487	6	1
Q30	1.056	1	1.046	3	0
Q31	0.638	0	0.910	3	0
Q32	0.930	1	0.982	3	0
Q33	0.700	0	0.941	3	0
Q34	0.754	0	0.976	3	0
Q35	0.768	0	0.966	3	0
Q36	0.627	0	0.900	3	0
Q37	1.696	2	0.712	4	1
Geo-region	1.022	1	0.989	4	0

Attributes	Target Attribute	Correlation coefficient
Q31	Anxiety Two category (2)	0.69032
Q32	Anxiety Two category	0.68472
Q30	Anxiety Two category	0.68466
Q33	Anxiety Two category	0.67673
Q36	Anxiety Two category	0.65965
Q35	Anxiety Two category	0.58508
Q34	Anxiety Two category	0.54546
Q37	Anxiety Two category	0.48791
Q19	Anxiety Two category	0.14877
Q22	Anxiety Two category	0.11936
Q26	Anxiety Two category	0.09987
Q20	Anxiety Two category	0.08589
Q18	Anxiety Two category	0.06622
Q24A2	Anxiety Two category	0.05201
Georegion	Anxiety Two category	0.05052
Q3	Anxiety Two category	0.02726
Q24A3	Anxiety Two category	0.02619
Q21	Anxiety Two category	0.01486
Q25	Anxiety Two category	0.01195
Q24A1	Anxiety Two category	0.00648

Attributes	Target Attribute	Correlation coefficient
Q31	Anxiety Three category	0.64316
Q30	Anxiety Three category	0.63942
Q32	Anxiety Three category	0.63888
Q33	Anxiety Three category	0.63119
Q36	Anxiety Three category	0.61451
Q35	Anxiety Three category	0.54564
Q34	Anxiety Three category	0.50835
Q37	Anxiety Three category	0.45479
Q19	Anxiety Three category	0.13954
Q22	Anxiety Three category	0.11146
Q26	Anxiety Three category	0.09348
Q20	Anxiety Three category	0.08045
Q18	Anxiety Three category	0.06233
Q24A2	Anxiety Three category	0.04852
Georegion	Anxiety Three category	0.04767
Q3	Anxiety Three category	0.02526
Q24A3	Anxiety Three category	0.02427
Q21	Anxiety Three category	0.01428
Q25	Anxiety Three category	0.01328
Q24A1	Anxiety Three category	0.00807

Number of features	Accuracy of SVM	Accuracy of J48	Average accuracy of each set of features
Using 20 Features	100%	95.79%	97.90%
Using 10 Features	100%	95.96%	97.98%
Using 5 Features	95.76%	95.00%	95.38%
Using 3 Features	92.97%	93.27%	93.12%
Using 2 Features	91.95%	91.51%	91.73%
Using 1 Feature	90.19%	90.19%	90.19%

Number of features	Accuracy of SVM	Accuracy of J48	The average accuracy of each set of features
Using 20 Features	100%	92.81%	96.40%
Using 10 Features	100%	93.50%	96.75%
Using 5 Features	93.14%	91.48%	92.31%
Using 3 Features	89.63%	89.96%	89.79%
Using 2 Features	87.11%	88.66%	87.89%
Using 1 Feature	85.18%	86.77%	85.98%

Performance Measure	SVM	J48
Accuracy (%)	100	95.96%
Precision	1	0.974
Recall	1	0.975
f-measure	1	0.975

		Predicted
		Anxiety	Non-Anxiety
Actual	Anxiety	2425 (TP)	0 (FN)
Actual	Non-Anxiety	0 (FP)	592 (TN)

		Predicted
		Mild	Moderate	Severe
Actual	Mild	2425	0	0
	Moderate	0	247	0
	Severe	0	0	345

Parameters	Optimal value chosen
Kernel	Poly Kernel
C	2
Epsilon	1.0E-12

Parameters	Optimal value chosen
Confidence Factor	0.45
MinNumObj	2

Parameters	Optimal value chosen
Kernel	Poly Kernel
C	4
Epsilon	1.0E-12

Parameters	Optimal value chosen
Confidence Factor	0.15
MinNumObj	2

PERMALINK

Prediction of generalized anxiety levels during the Covid-19 pandemic: A machine learning-based modeling approach

Faisal Mashel Albagmi

Aisha Alansari

Deema Saad Al Shawan

Heba Yaagoub AlNujaidi

Sunday O Olatunji

Abstract

1. Introduction

2. Review of related literature

Table 1.

3. Description of the proposed techniques

3.1. Support Vector Machine (SVM)

Fig. 1.

3.2. J48 decision tree

Fig. 2.

4. Empirical studies

4.1. Description of dataset

Table 2.

4.2. Statistical analysis

Table 3.

Table 4.

Table 5.

4.3. Experimental setup

4.4. Performance measure

4.5. Optimization strategy

4.5.1. Two-class anxiety classification

Fig. 3.

Fig. 4.

Table 6.

Fig. 5.

Table 7.

4.5.2. Three-class anxiety classification

Fig. 6.

Fig. 7.

Table 8.

Fig. 8.

Table 9.

5. Results and discussion

5.1. Feature selection

5.1.1. Two-class anxiety classification

Table 10.

5.1.2. Three-class anxiety classification

Table 11.

5.2.1. Results of the two-class anxiety classification

Table 12.

Table 13.

5.2.2. Results of the three-class anxiety classification

Table 15.

Table 14.

Table 16.

Table 17.

Table 18.

5.3. Comparing the achieved result for classifying two-class and three-class anxiety problems

Table 19.

5.4. Further discussions

Fig. 9.

Fig. 10.

Fig. 11.

Fig. 12.

6. Conclusion and recommendation

Declaration of competing interest

Acknowledgement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases