Impact of COVID-19 on public social life and mental health: a statistical study of google trends data from the USA

Archi Roy; Soudeep Deb; Divya Chakarwarti

doi:10.1080/02664763.2022.2164562

. 2023 Jan 10;51(3):581–605. doi: 10.1080/02664763.2022.2164562

Impact of COVID-19 on public social life and mental health: a statistical study of google trends data from the USA

Archi Roy ^a, Soudeep Deb ^b,^CONTACT, Divya Chakarwarti ^c

PMCID: PMC10868428 PMID: 38370267

Abstract

The COVID-19 pandemic has caused a significant disruption in the social lives and mental health of people across the world. This study aims to asses the effect of using internet search volume data. We categorize the widely searched keywords on the internet in several categories, which are relevant in analyzing the public mental health status. Corresponding to each category of keywords, we conduct an appropriate statistical analysis to identify significant changes in the search pattern during the course of the pandemic. Binary segmentation method of changepoint detection, along with the combination of ARMA-GARCH models are utilized in this analysis. It helps us detect how people's behavior changed in phases and whether the severity of the pandemic brought forth those shifts in behaviors. Interestingly, we find that rather than the severity of the outbreak, the long duration of the pandemic has affected the public health status more. The phases, however, align well with the so-called COVID-19 waves and are consistent for different aspects of social and mental health. We further observe that the results are typically similar for different states as well.

Keywords: ARMA-GARCH models, changepoint detection, coronavirus pandemic, google search volume, infodemiology

1. Introduction

The coronavirus pandemic or COVID-19, the fifth documented pandemic in history since the 1918-flu [34], caused a major disruption in people's social and mental well-being. First reported in China in December 2019, it spread rapidly all over the world, causing the World Health Organization to declare it as a pandemic by early March 2020. As a mitigation procedure, governments of most countries imposed various travel restrictions, limitations on public gatherings and sometimes long stay-at-home orders. These ‘lockdown’ measures inevitably hindered people from regular social activities, casual gatherings, and often curbed their access to non-essential public services. Additionally, irrespective of the social or economic classes, people's lives have been affected by spreading of misinformation [16], loss of employment, lack of financial security [61], sudden collapse of public healthcare facility [55], and increasing number of deaths caused by the SARS-CoV-2 virus with no known remedies for a long time. The study [5] highlights the mental health-related issues faced by individuals during this period as a parallel pandemic to the COVID-19 outbreak. Consequently, the status of public mental health, especially during the first phase of the pandemic, piqued the interest of the social science researchers. In fact, even after two years since the pandemic started, it still remains of critical importance and works as a motivation behind our study.

1.1. Brief literature review

In the existing literature, there are many studies that aim to find evidences of increased tendencies of anxiety, depression and loneliness during such national or global level health crises. The studies [9,51] have previously analyzed that individuals exposed to the risk of quarantine tend to experience long-term exhaustion, irritability, lack of concentration, reduced productivity at the workplace and even reluctance to work and resigning tendencies. In another similar work, Ref. [60] showed that such effects may persist for up to as long as three years. Most of these researches focused on the 2002–2004 outbreak of SARS, and brief reviews can be found in Refs [11,37].

In context of the recent COVID-19 pandemic, the existing literature can broadly be divided into two streams. The first stream of studies depend on data obtained from physically conducted surveys and questionnaires. For example, the study [58] considers a multi-national sample of caregivers for 5- to 18-year-old children and analyze the relation between their mental health and the disruption from COVID-19. [26], in their study on the effect of the pandemic on the sleep pattern of adults in United States of America (USA) show that insomnia and sleep disturbances increased in this time, especially for people younger than 60 years of age. In another pertinent study, Ref. [22] identifies that from March to July of 2020 alone, proportion of people at risk of clinical depression went up by 90% as compared to the same population prior to the pandemic. The study [49] carries out another important study on the mental impact of the COVID-19 crisis on patients with neuropsychiatric symptoms like depression, hallucinations, delusion etc. and their caregivers. The study [36] highlights the increased number of cases of internet addiction and abuse, gaming disorder, online gambling, problematic smartphone usage severity, which often, directly or passively triggers long-term insomnia, loneliness and depression. The study's [30] reports increased cases of self-harm and suicidal thoughts in adolescents and young adults during the lockdown. The reader may further refer to Refs [20,27] for reviews of these works. The paper by Ref. [28] is also worth mention in this regard as it carried out an interesting comparative study between the effects of SARS and COVID-19 on the mental health.

The other stream of research depends on the internet search volume data to infer on the public health status. Our focus in this paper is primarily on this approach. One of the main advantages of this over a survey-based approach is the greater degree of heterogeneity obtained through internet search data, which is not always attainable through traditional clinical sampling methods. Naturally, it is extremely helpful in capturing population level information regarding sensitive issues like mental health. The reader may refer to Ref. [32] for a detailed discussion on the advantages of using online query data in understanding social psychology. Another interesting attempt in this line has been by Ref. [52] who used a hybrid model of five components for modeling and predicting search behaviors related to keywords like ‘depression’, ‘loneliness’ etc. The study [18], on the other hand, assessed the impact of the outbreak on adolescent psychological well-being in terms of compulsive internet usage, escapism, loneliness, depression and gaming addiction. A couple of comparative analysis in people's online behavior before and after the pandemic can be found in Refs [7,62]. The former concentrated on anxiety-related queries searched on the internet, while the latter focused on the queries related to sleep disturbance. Another extensive study to understand the changes in category-wise internet searching behaviors using time series regression models has been conducted by Ref. [46]. The work by Ref. [48] is also extremely relevant in this regard. They leveraged Google Trends data to gauge the effects of lockdown on insomnia, depression, suicidal tendency etc. In another recent study, the authors [6] focused on identifying the relationship between COVID-19 vaccination and anxiety; and took a panel data modeling approach through difference-in-differences regression to analyze that.

1.2. Our contribution

We advance this branch of research in primarily two directions. First, we add to the existing literature in the aforementioned second stream by using diverse categories of keywords that are tremendously important in the study of public mental health status. In this article, we would be considering a few states of United States of America (USA), one of the most affected countries due to COVID-19. It is also one of the most explored countries in terms of research on psychological impacts of the pandemic [20]. We choose four states – Pennsylvania, Florida, Illinois and New York – for this study as they have a large number of internet users as well as sufficient heterogeneity in terms of population, culture, socio-economic status, etc. We shall abbreviate these states in the following sections as ‘PA,’ ‘FL,’ ‘IL’ and ‘NY,’ respectively. Taking into consideration multiple states allows us to understand the similarities and dissimilarities of the impact of the pandemic across different regions, which was missing from the earlier research.

The second important contribution of this paper is the development of a solid statistical methodology, backed by appropriate empirical analysis, to make inference about the problem. In this regard, we utilize a generalized autoregressive heteroskedastic (GARCH) modeling framework and take into account the changepoints statistically identified in the data, as well as suitable exogenous variables. To the best of our knowledge, such detailed statistical work has not been conducted yet in a similar context.

Rest of this paper is structured in the following way. Section 2 details the description of the data we are dealing with. Next, in Section 3, we take the reader through the methodology used in the paper. Results and corresponding discussions are presented in Section 5. Finally, some important concluding remarks are provided in Section 6.

2. Data

To analyze the effect of COVID-19 on social life, we rely on the Google Search Volume (GSV) data which have been proven to be effective in similar studies. The GSV corresponding to a specific keyword is essentially a score variable, taking values in the range $(0, 100)$ . It is a unit-free measure and represents the relative search interest of that keyword (with respect to a base keyword) on a particular day, based on the number of Google searches containing that keyword. The keywords analyzed in this article are presented in Table 1. One should observe that the keywords can be broadly categorized into nine categories which reflect different aspects of mental health and social life. Brief descriptions of every category are also provided in the same table. It can be noticed that on the basis of the nature of the keywords and statistical analysis, the categories can be further put into three groups. The first three categories ‘positive’, ‘negative’ and ‘psychological’ can be collated under the group ‘mental health determinants’; and this group is of the most significant interest to us. Next, the categories ‘health and lifestyle’, ‘hygiene’ and ‘vaccine’ are associated with ‘pandemic relevant practices’ while the categories ‘wfh’, ‘parenting’ and ‘hobbies’ are related to the group ‘daily activities and social well-being’.

Table 1.

Categories, keywords and relevant description for the Google Search Volume data collection procedure.

Variable	Keywords	Description
Negative	Burnout, unhappy, failure, death, negativity, tired, loss, etc.	Queries containing negative words or feelings
Positive	Positivity, wellness, success, gain, accomplish, divine, etc.	Queries containing positive words or feelings
Psychological	depression, toxic, suicide, stress, claustrophobia, anxiety, panic, psychiatrist, insomnia, antidepressant, panic, therapy, etc.	Queries related to mental health issues
Health and lifestyle	immunity, lifestyle, healthy, zen, yoga, meditation, exercise, etc.	Queries related to general health and lifestyle
Hygiene	handwash, mask, hygiene, sanitizer, distancing, etc.	Queries related to hygiene practices which were especially highlighted during the pandemic
Vaccine	vaccine, Johnson, JJ, Pfizer, vaccinated, etc.	Queries related to vaccine and vaccination drives
wfh	virtual, work-from-home, home-office, meeting, zoom, office, etc.	Queries related to the recent work from home culture
parenting	parenting, babysitting, creche, childcare, kids, etc.	Queries related to parenting challenges faced during the pandemic
Hobbies	hobby, leisure, gardening, games, indoor, chess, diy, creative, painting, etc.	Queries related to popular hobbies and creative activities

Open in a new tab

In this paper, we follow a two-phase data collection methodology to obtain the GSV data for the concerned period of time. The raw data is extracted and cleaned in the first phase, whereas in the second, the cleaned data is expressed on a relative scale, normalized and finally merged. Below, in Algorithm 1, we present the pseudocode of the data extraction procedure and explain the steps in detail afterward.

As we can see in the pseudocode, there are two phases in the data extraction procedure. In the first phase, for a selected keyword, the daily Google search volume data is obtained using dailydata() function from the ‘pytrends’ library offered by Google Trends. This function takes as an input a particular keyword, a time range and a location and fetches daily search volume data from Google Trends and returns the GSV of the input keyword for the given time and location in a pandas DataFrame object. The output DataFrame object has 5 columns which are respectively the date column, the original daily GSV data fetched month by month (which is not comparable across different months but is comparable within a month), the original monthly data fetched at once and the scale used to obtain the scaled daily data. We begin by preprocessing this object by setting the missing values (possibly appearing due to API failure during the function call) obtained in this process to zero and dropping the irrelevant columns. Note that, due to the way Google Trends scales and returns data, special care needs to be taken to make the daily data comparable over different months. To do that, we download the preprocessed daily data on a month by month basis, along with the daily-level data for each month. The monthly data is downloaded in one go, so that the monthly values are comparable amongst themselves and can be used to scale the daily data. The daily data is scaled by multiplying the daily value by the monthly search volume divided by 100. The reader may refer to the GitHub resource (link: https://github.com/GeneralMills/pytrends) for more details and usage guide on the ‘pytrends’ library in python.

The above procedure is repeated for each keyword of our choice, keeping the state variable constant in each iteration. Therefore we obtain a relative score between 0 to 100 assigned to each keyword on the basis of the frequency of its Google search history over a period of 16 months, i.e. 1st December, 2019 to 31st March, 2021. Here, 0 indicates the least searched word, while 100 indicates the most frequently searched word in relative terms. The geographic location is restricted to the aforementioned states of the USA.

In the second phase of the procedure, we obtain the GSV of the keywords, relative to a base keyword. Using a base keyword helps in maintaining the same scale across keywords and hence facilitates better comparison of the search volumes of different keywords. The base-word should not be at one extreme end of the score range. For our analysis, it is chosen to be ‘happy’ and is kept uniform for all states for the entire period. After that, normalization is applied to the relative popularity scores of the keywords. This is a scaling technique in machine learning and is applied during data preparation to change the values of numeric columns in the dataset to use a common scale. We use min-max normalization in this procedure. The reader may refer to Ref. [39] for more details on this method. Next, the normalized relative scores of all the keywords falling in a particular category (see Table 1) are averaged to obtain the daily GSV for a category. We point out that equal weight is given to all of the keywords within a category. We acknowledge that, in general, the average method might not take into account outliers in the scores of keywords, but it is observed that over time, the relative scores of the keywords of each category do not display extreme differences. The above process is repeated for all the states and keywords we consider in this article. At the end of phase 2, we merge the outputs for all cases and obtain the final data-frame that contains the GSV data for the keyword categories, scaled between 0 and 100, for each date and state.

Note that our extraction procedure is inspired from the works of [15,46], who have previously used GSV data for sentiment analysis. We also point out that we restrict ourselves to the first year of the pandemic, as that is of more significant interest to the researchers. However, the same analysis can be carried out by extending the data to the present date, although our exploration suggests that the pandemic has not significantly impacted the search trends in the second half of 2021 or onward.

3. Methodology

3.1. Empirical analysis

We start with the descriptive statistics of the data for the nine categories and for the four states. The summary measures are presented in Table 2. There, small values of the standard deviation indicate that the searching trends for the categories ‘negative’, ‘positive’, ‘wfh’, ‘health’, ‘hygiene’, ‘hobbies’ have usually been more consistent over the entire period for all four states. In contrast, keywords related to ‘parenting’, ‘psychological’ and ‘vaccine’ have observed more variation. The mean pattern typically is at a similar level for all categories and states.

Table 2.

Descriptive statistics (mean, standard deviation, range) and time series properties for the GSV data for different categories and different states.

Group	Category		FL	NY	PA	IL
Mental	Negative	Mean (sd)	48.61 (8.74)	43.11 (7.33)	52.59 (11.71)	49.78 (8.82)
Health		Range	(26.00, 76.88)	(26.78, 69.67)	(23.25, 88.75)	(23.75, 78.62)
Determinants		JB	0.02	0.00	0.04	0.55
		LB	0.00	0.45	0.00	0.00
		LM	0.00	0.00	0.00	0.00
		ADF	0.01	0.01	0.01	0.01
	Positive	Mean (sd)	49.80 (11.24)	43.63 (8.27)	54.40 (11.14)	45.09 (11.18)
		range	(22.17, 80.33)	(21.43, 66.00)	(23.00, 86.50)	(11.17, 82.00)
		JB	0.63	0.56	0.26	0.81
		LB	0.00	0.00	0.00	0.00
		LM	0.00	0.00	0.00	0.00
		ADF	0.01	0.01	0.01	0.01
	Psychological	Mean (sd)	57.59 (16.61)	38.83 (10.60)	59.36 (10.35)	52.20 (16.91)
		range	(16.00, 100.00)	(11.67, 65.33)	(33.00, 67.00)	(15.50, 100.00)
		JB	0.03	0.25	0.21	0.01
		LB	0.00	0.00	0.00	0.00
		LM	0.00	0.00	0.00	0.01
		ADF	0.01	0.01	0.01	0.01
Pandemic	Health and	Mean (sd)	56.25 (10.22)	59.18 (9.78)	52.81 (10.78)	50.80 (10.97)
Relevant	Lifestyle	Range	(29.20, 83.40)	(33.00, 89.00)	(21.40, 81.00)	(22.20, 87.00)
Practices		JB	0.07	0.15	0.37	0.48
		LB	0.00	0.00	0.00	0.00
		LM	0.00	0.00	0.00	0.00
		ADF	0.01	0.01	0.01	0.01
	Hygiene	Mean (sd)	45.82 (12.20)	48.11 (11.22)	23.11 (29.88)	43.96 (12.48)
		range	(13.00, 79.15)	(18.50, 89.00)	(0.00, 100.00)	(9.25, 88.50)
		JB	0.80	0.48	0.00	0.17
		LB	0.00	0.00	0.86	0.00
		LM	0.00	0.00	0.68	0.00
		ADF	0.01	0.01	0.01	0.01
	Vaccine	Mean (sd)	40.42 (17.19)	41.10 (16.49)	40.30 (17.19)	41.90 (16.82)
		Range	(3.00, 98.00)	(8.67, 94.00)	(1.33, 52.33)	(4.00, 94.00)
		JB	0.00	0.00	0.00	0.06
		LB	0.00	0.00	0.00	0.00
		LM	0.00	0.00	0.00	0.00
		ADF	0.01	0.01	0.01	0.01
Daily activity	wfh	Mean (sd)	32.98 (9.33)	33.63 (8.87)	38.50 (18.26)	36.83 (15.51)
and social		Range	(7.00, 50.00)	(10.50, 50.00)	(7.50, 100.00)	(5.00, 100.00)
well-being		JB	0.02	0.01	0.00	0.00
		LB	0.00	0.00	0.00	0.00
		LM	0.00	0.00	0.00	0.00
		ADF	0.01	0.01	0.01	0.01
	Parenting	Mean (sd)	35.94 (20.23)	33.74 (18.95)	33.54 (27.71)	30.00 (20.31)
		Range	(0.00, 94.50)	(0.00, 96.00)	(0.00, 100.00)	(0.00, 100.00)
		JB	0.00	0.00	0.00	0.00
		LB	0.01	0.00	0.12	0.17
		LM	0.32	0.71	0.63	0.13
		ADF	0.01	0.01	0.01	0.01
	Hobbies	Mean (sd)	46.12 (7.62)	48.81 (11.22)	47.62 (10.49)	47.56 (9.72)
		Range	(23.62, 76.12)	(26.50, 76.50)	(20.50, 54.00)	(17.33, 79.83)
		JB	0.00	0.17	0.01	0.07
		LB	0.00	0.00	0.00	0.00
		LM	0.00	0.00	0.00	0.00
		ADF	0.01	0.01	0.01	0.01

Open in a new tab

Notes: ‘JB’ denotes the p-value for Jarque Bera Test, ‘LB’ denotes the p-value for the Ljung-Box test, ‘LM’ denotes the p-value for the Lagrange multiplier test, ‘ADF’ denotes the augmented Dickey–Fuller test for stationarity, each implemented as discussed in Section 3.1.

In addition to the standard summary measures, we also conduct relevant tests for checking pivotal time series properties. First, we use [29] test (JB) for assessing normality in each series. Second, to check for significant autocorrelation, we apply the [35] test (LB) and implement it at lags up to 20. Third, we utilize Lagrange multiplier test (LM) to check for autoregressive heteroskedastic (ARCH) effect in the series [17]. This is motivated from the residual diagnostics of an initial autoregressive moving average (ARMA) model fitted to the data following the existing literature [52]. The residuals corresponding to every series in the analysis evidently indicate the presence of heteroskedasticity, and thus we formally run the LM test to detect the presence of ARCH effects. Finally, we conduct an augmented Dickey-Fuller test to check for stationarity [19] of the time series data. These properties of the series, as indicated by the aforementioned tests, corresponding to each state-keyword combination are presented in the same table. Note that we have run all these tests at 5% level of significance.

Looking at the aforementioned properties of the data corresponding to each state and keyword combination we see that most of the p-values for the JB test of normality are close to 0, especially for the last four categories. This clearly rejects the null hypothesis of normal distribution of the respective series. For the other state-keyword combinations, the data do not indicate significant deviation from the assumption of normality, but for generalizability and consistency in framing the modeling procedure, we shall rely on a non-Gaussian error distribution across all cases. In fact, our observation has been that analyzing the data corresponding to the state-keyword combinations which do conform to the normality assumptions with a Gaussian error distribution does not yield significantly better results than analyzing them with a generalized error distribution in terms of model fit. This is described in more detail in the next subsection.

The p-values for the LB test are mostly close to 0, indicating the presence of significant autocorrelation up to the selected lag. We also note that all the p-values for the LM tests are below 0.05, thereby confirming the presence of heteroskedasticity in the data. Finally, from the results of the ADF tests, it should be observed that data corresponding to all state-keyword combinations conform to the stationarity hypothesis. In our modeling framework elaborated in the next section, these findings are leveraged extensively to decide on the final model structure.

We next compute the cross-correlation function (CCF) between the series for different states corresponding to each keyword. As an illustration, the cross-correlation behaviors between searching trends of Florida and New York are given in Figure 1. Other pairs also display a similar pattern. In general, for every pair of states and for most of the keywords, it is not found to be highly significant over the entire time-frame we have considered. One may notice that few of the categories, for instance ‘wfh’, ‘vaccine’, ‘hygience’ etc., show considerable number of spikes in the CCF plot. Although it might suggest a unified modeling structure taking all or some states together, our initial explorations indicate that such models do not lead to significantly better results. Furthermore, as the motive of our study is to focus on the searching behavior of the different keywords during the entire duration of the pandemic, it would be fair to emphasize on the temporal structure of the search trends rather than the spatial one. Therefore, we decide to consider each state separately in the main analysis below.

Figure 1. — The CCF (upto lag 30) for the searching trends between Florida and New York with respective to different keywords. Note that the categories ‘negative’, ‘positive’ and ‘health and lifestyle’ are written as ‘neg’, ‘pos’ and ‘lifestyle’ here.

Further in relation to the above point, we also note that the series corresponding to different pairs of categories are not found to be significantly correlated. The correlogram corresponding to the categories is provided in Figure 2. It shows that the correlation between different series are typically not very strong, thereby justifying separate statistical modeling for every series instead of taking a vector-valued time series approach.

Figure 2. — The correlogram for the categories all the states. Here, ‘var1’ and ‘var2’ denote the categories, with the convention: 1:negative, 2:positive, 3:psychological, 4:health and lifestyle, 5:hygiene, 6:vaccine, 7:wfh, 8:parenting, 9:hobbies.

3.2. Statistical modeling

We start this section with a brief summary of the findings from Section 3.1. It is observed that, in general, neither the data corresponding to different states nor the data for different keywords within a single state exhibit considerably high correlation. Thus, we decide to model the time series corresponding to each state and keyword combination separately. Next, following the existing literature in the same line of [7,46], the basic modeling structure we consider for every series is ARMA with orders 1 in both autoregressive and moving average parts. Along with that, empirical analysis advocates for the use of GARCH structure for the errors to take into account the presence of the ARCH effect. We utilize the eGARCH(1,1) framework as it renders better fits, in terms of the Akaike's Information Criterion (AIC), than standard or other available GARCH models. Furthermore, we recall that the normality assumption has been rejected by the JB test in most of the cases. In view of that, the errors are assumed to follow the shape and skew generalized error distribution (SGED). The reader is referred to Ref. [53] for more information about SGED.

It should be pointed out that the choice of the aforementioned mean modeling structure is motivated from our empirical analysis in the previous subsection. In fact, the choice of ARMA modeling structure in analyzing GSV data in different contexts related to the COVID-19 pandemic can be widely found in the existing literature. For example, the study [24] modeled Google searches in the USA containing queries related to suicide and self-harm using an ARMA structure to draw inference on the dip of public mental health status during the COVID-19 pandemic. The study [52] also used similar modeling structure for Google search queries related to depression, insomnia and other psychiatric conditions during the pandemic. The study [14] used the combination of ARMA and GARCH in assessing the impact of the pandemic on the airlines stock markets. Use of similar modeling strategy in analyzing the mental health impacts of the COVID-19 pandemic through Google Trends data abound in the existing literature. Some notable researches in this line are Refs [4,21,33,41].

The model thus defined is hereafter denoted as ARMA(1,1)-eGARCH(1,1), and the formal mathematical description is provided below. Note that the orders of the model are decided based on the AIC criteria; and we noticed that higher values of the orders do not improve the model performance significantly. It is also worth mentioning that our aim has been to develop a consistent modeling framework for all series, such that it renders good fit as well as apt understanding of the research questions addressed in this article. The aforementioned structure serves that purpose, as we will see in the following section.

Next, turn attention to the exogenous regressors used in the model. Many researchers, such as Refs [23,43], established that the number of deaths from previous weeks is often useful in statistical modeling with COVID-19 data. Inspired by these works, we include the variable death-rate in our model to incorporate the effect of the severity of the pandemic on the GSV series. The death-rate variable (DR) is calculated by finding the total number of deaths over the last five days and then dividing it by the total population. We take this approach with the underlying assumption that the death-rate of the present day does not immediately affect people's searching behavior, but affects them at a few days lag. It is noteworthy that increasing or decreasing the lag from a period of five days does not affect the results or the efficacy of the model substantially. Similar methods of manipulating death-rate so as to incorporate lagged effect can be found in previous research, e.g. Ref. [13].

Mathematically, let us use $D_{S, t}$ to denote the number of deaths on the $t^{th}$ day in state S, and let $P_{S}$ be the initial population of the same state. Then, the modified death-rate variable for state S and day t can be written as

{DR}_{S, t} = \frac{1}{P_{S}} \sum_{j = 1}^{5} D_{S, t - j} .

(1)

In line with the above notations, let us use $G_{S, t}^{c}$ to denote the GSV of the $c^{th}$ category in state S on day t; but for generality, unless absolutely needed, we shall drop the subscript S and the superscript c. Now, as exogenous regressors, we intend to incorporate in the model the phase-wise effects of the pandemic. This is achieved by identifying changepoints in each ${G_{t}}$ series; that is, by finding significant structural breaks which may have occurred in the keyword searches during the entire timeline of the pandemic. The changepoints are determined using the binary segmentation method [47]. This method is widely used in practice to perform fast signal segmentation. It is a sequential approach where the first changepoint is detected in the complete input signal, then the series is split around this changepoint and the same operation is repeated on the two resulting sub-signals, and so on.

In our application, for changepoint detection in the Google search volume data $G = {G_{t}}$ , under the relevant assumptions as alluded to earlier, the binary segmentation method identifies as its maximizer a candidate estimator of changepoint $k_{1}$ from scanning the cumulative sum (CUSUM) statistic $| τ_{0, k, n} (G) |$ , such that for $0 ⩽ p < k < e ⩽ n$ ,

τ_{p, k, e} (G) = \sqrt{\frac{(k - p) (e - k)}{(e - p)}} ({\bar{G}}_{p, k} - {\bar{G}}_{k, e}), where {\bar{G}}_{a, b} = \frac{1}{b - a} \sum_{t = a + 1}^{b} G_{t} .

(2)

We then define the maximum CUSUM statistic and the corresponding location as

τ_{0, n}^{\max} = max_{k = 1 (1) n} | τ_{0, k, n} (G) |, k_{1} = \underset{k = 1 (1) n}{\arg \max} | τ_{0, k, n} (G) | .

(3)

If the maximum CUSUM statistic is sufficiently large, then $k_{1}$ is perceived as a changepoint. Then, the data is partitioned into ${G_{t}}_{t = 1}^{k_{1}}$ and ${G_{t}}_{t = k_{1} + 1}^{n}$ , and the scanning and maximization of the CUSUM statistic is repeated over each partition separately until a stopping criterion is satisfied. The reader may refer to [8,12,57] for more details on this procedure.

In the existing literature of cohort studies on the mental health impacts of COVID-19 pandemic often a minimum gap of three months is maintained between the initial and the follow-up survey (the reader may refer to Refs [45,56] for a brief review of such studies). Keeping that in mind, while implementing this algorithm, we assume that the changes can occur at a minimum gap of three months. Then, we find the three most significant changepoints and denote them as ${CP}_{1}$ , ${CP}_{2}$ and ${CP}_{3}$ , respectively. The phase-wise effects are then captured through the three dummy variables

{CP}_{1_{t}} = I {t > {CP}_{1}}, {CP}_{2_{t}} = I {t > {CP}_{2}}, {CP}_{3_{t}} = I {t > {CP}_{3}} .

(4)

In the above equation, $I {\cdot}$ is used to denote the indicator function. It is imperative to point out that the changes in the search behaviors throughout the entire timeline are found to be on the mean level and not on the variance level. Thus, the aforementioned dummy variables help us in quantifying the long-term impact of exogenous shocks, possibly caused by the pandemic. This technique of utilizing dummy variables in assessing the phase-wise effects is common in financial data analysis, see for example [1,59] and relevant references therein.

Overall, the structural setup we use in this study to model the GSV data can be written as

G_{t} = μ + ϕ_{1} G_{t - 1} + δ_{1} {DR}_{t} + δ_{2} {CP}_{1_{t}} + δ_{3} {CP}_{2_{t}} + δ_{4} {CP}_{3_{t}} + e_{t} + θ_{1} e_{t - 1},

(5)

where the error terms are modeled through the equations

e_{t} = ω + σ_{t} ϵ_{t}, \log σ_{t}^{2} = α_{0} + α_{1} | ϵ_{t - 1} | + β_{1} \log σ_{t - 1}^{2} + γ_{1} ϵ_{t - 1} .

(6)

In the above formulation, $ϕ_{1}$ and $θ_{1}$ are the AR and MA coefficients, while the terms $δ_{2}, δ_{3}, δ_{4}$ indicate the effect of different phases of the pandemic. Note that the error terms $ϵ_{t}$ come from a skew and shape generalized error distribution (SGED). Our observation is that if this distribution is changed to white noise for the series that do not reject the normality hypothesis, the results do not change drastically. Therefore, the SGED is used in the model for all state-keyword combinations, and it furnishes a more general framework to analyze the problem. After fitting the model, we carry out significance tests of the model coefficients at a critical level of $α = 0.05$ .

4. Simulation study

As described above, the proposed methodology relies on primarily three aspects – use of binary segmentation method in identifying the changepoints, the assumption of sufficiently apart three changepoints, and the use of ARMA(1,1)-eGARCH(1,1) structure in the modeling framework. In the real data analysis in this paper, we show that the method works fairly well for all cases. Before that, in this section, we conduct a brief simulation study with a three-fold objective. First, we aim to see how well the changepoint identification method detects the true changepoints. Second, we want to evaluate whether the modeling structure works equally well if the true underlying data generating process is different and thereby comment on the adequacy of the proposed methodology in dealing with similarly structured datasets related to mental health impacts or beyond. And finally, in an attempt to investigate the robustness of our approach, we assess the model fit and the accuracy in estimating the effects of the covariates by changing the length of the series as well as by varying the true number of changepoints in the data.

4.1. Simulation setups

To mimic the structure of the main data analyzed in this study, we generate time series observations from different data-generating processes (DGP) and scale them to be between 0 and 100. There are seven different processes we take into account in this regard. First one is the simplest structure of the white noise, where all observations are assumed to be independent and identically distributed (iid) $N (0, 1)$ random variables. We then consider the autoregressive (AR) model and the moving average (MA) model, each with order 1. For the fourth DGP, we make it more general under the homoskedastic setup by considering an autoregressive integrated moving average (ARIMA) model with an integration parameter of 1. Next, we consider three different processes under conditional heteroskedastic setup. One of them is the ARCH model alluded to earlier in the paper. Subsequently, an asymmetric power ARCH (APARCH) model of order (1,1) is considered. For the last DGP in this simulation study, we consider an ARMA-GARCH model with ARMA orders (1,2) and GARCH orders (1,1). The details of these modeling structures and the corresponding parameter values fixed throughout our simulation study are briefly specified in Table 3.

Table 3.

Details of the different DGPs used in the simulation study.

DGP	Structure	Parameters
White noise	$G_{t} = X_{t}^{'} δ + ϵ_{t}$	$ϵ_{t} \overset{iid}{\sim} N (0, 1)$
AR	$G_{t} = X_{t}^{'} δ + β_{1} G_{t - 1} + ϵ_{t}$	$β_{1} = 0.5$ , $ϵ_{t} \overset{iid}{\sim} N (0, 1)$
MA	$G_{t} = X_{t}^{'} δ + ϵ_{t} + θ_{1} ϵ_{t - 1}$	$θ_{1} = - 0.6$ , $ϵ_{t} \overset{iid}{\sim} N (0, 1)$
ARIMA	$Y_{t} = X_{t}^{'} δ + β_{1} Y_{t - 1} + β_{2} Y_{t - 2} + ϵ_{t} + θ_{1} ϵ_{t - 1} + θ_{2} ϵ_{t - 2}$ ,	$β_{1} = 0.9$ , $β_{2} = - 0.5$ ,
	where $Y_{t} = G_{t} - G_{t - 1}$	$θ_{1} = - 0.2$ , $θ_{2} = 0.2$ , $ϵ_{t} \overset{iid}{\sim} N (0, 0.2)$
ARCH	$G_{t} = X_{t}^{'} δ + σ_{t} ϵ_{t}$ , where $σ_{t}^{2} = α_{1} G_{t - 1}^{2} + α_{2} G_{t - 2}^{2}$	$α_{1} = 0.2$ , $α_{2} = 0.4$ , $ϵ_{t} \overset{iid}{\sim} N (0, 1)$
APARCH	$G_{t} = X_{t}^{'} δ + σ_{t} ϵ_{t}$ , where $σ_{t} = α_{1} \| ϵ_{t - 1} \| + β_{1} σ_{t - 1}$	$α_{1} = 0.1$ , $β_{1} = 0.9$ , $ϵ_{t} \overset{iid}{\sim} N (0, 1)$
ARMA+GARCH	$G_{t} = X_{t}^{'} δ + β_{1} G_{t - 1} + β_{2} G_{t - 2} + e_{t} + θ_{1} e_{t - 1} + θ_{2} e_{t - 2}$ ,	$β_{1} = 0.2$ , $β_{2} = - 0.1$ , $θ_{1} = 0.3$ , $θ_{2} = - 0.5$ ,
	where $e_{t} = σ_{t} ϵ_{t}$ , $σ_{t}^{2} = α_{1} ϵ_{t - 1}^{2} + α_{2} σ_{t - 1}^{2}$	$α_{1} = 0.1$ , $α_{2} = 0.9$ , $ϵ_{t} \overset{iid}{\sim} N (0, 1)$

Open in a new tab

Note: In each equation, $X_{t}^{'} δ$ denotes the effect of the exogenous regressors.

In the simulation exercises, we generate time series of length n from DGPs specified in Table 3 where $n \in {500, 1000}$ . We then introduce $k \in {1, 3, 5}$ randomly selected changepoints ${t_{1}^{true}, t_{2}^{true}$ ,…, $t_{k}^{true}}$ in each of the generated data. These lead to the exogenous regressors ${CP}_{1_{t}}, \dots, {CP}_{k_{t}}$ in the same spirit as Equation (4). In addition, we simulate another exogenous regressor ${E_{t}}$ from an AR(0.5) process (scaled to be between 0 and 1), akin to the exogenous variable death-rate used in the main analysis (see Equation (5)). Using these variables, we define the vector of regressors $X_{t}$ . Correspondingly, the coefficient vector $δ$ is designed to calculate the mean structure $X_{t}^{'} δ$ for each DGP. In each iteration of the simulation exercises, the true values of these coefficients are generated independently from a $N (0, 1)$ distribution. We shall evaluate the accuracy of the proposed methodology in estimating these coefficients.

While implementing our approach, we first detect the changepoints in the data using binary segmentation method, under the assumption that there is at least 90 days of gap between two consecutive changepoints and there are at most three changepoints in the data. Based on these three found changepoints, say { $t_{1}^{est}$ , $t_{2}^{est}$ , $t_{3}^{est}$ }, we define three indicator variables as (t indicates time)

{CP}_{i_{t}}^{est} = I {t > t_{i}^{est}}, i = 1, 2, 3.

(7)

Finally, following the procedure described in Section 3.2, we fit an ARMA(1,1)-eGARCH(1,1) model to the generated data, including ${CP}_{1_{t}}^{est}, {CP}_{2_{t}}^{est}, {CP}_{3_{t}}^{est}$ and E as exogenous variables. Then, we evaluate the performance of the model and the efficacy of the changepoint identification method for detecting the true changepoints for each of the cases. The metrics used to assess the accuracy are described in the next subsection.

4.2. Evaluation

We first measure how accurately our chosen method of changepoint detection finds the true changepoints. We use the Rand index measure ([42]) to calculate this accuracy. This metric is commonly used to compare the similarity of two clustering outcomes. To define it in our context, consider the set $Γ = {1, 2, \dots, T}$ where T is the total number of time points considered in the exercise. In line with the aforementioned notations, suppose the true changepoints are at positions ${t_{1}^{true}, t_{2}^{true}, \dots, t_{k}^{true}}$ , for $k \in {1, 3, 5}$ , while our method finds changepoints at positions ${t_{1}^{est}, t_{2}^{est}, t_{3}^{est}}$ . Use these changepoints to define different segments (or equivalently called clusters) in the entire timeline Γ. The Rand index is then defined as

RI = \frac{a + b}{(\binom{T}{2})} .

(8)

In the above, a denotes the number of unordered pairs of elements from Γ belonging to the same segment in the true setup as well as in the estimated segmentation, whereas b denotes the number of unordered pairs from Γ belonging to different segments in both. Therefore, $(a + b)$ is essentially the total number of agreements between the truth and the estimated changepoints. It is clear that the above measure takes value between 0 and 1. Value of the Rand index closer to 1 would indicate that our method has detected the true changepoints more accurately, while a lower value would point to the inadequacy of the approach.

After having detected the changepoints, we fit the model following the procedure elaborated in the previous subsection. The efficacy of the model is assessed using the AIC value which takes into account the number of observations in the data, total number of parameters in the model, and the sum of squares due to the residuals after fitting the model. Note that AIC can take both positive and negative values and conventionally, a lower value implies better fit of the model.

Furthermore, we look at the root mean squared error (RMSE) of the fitted coefficients for the regressors effects as evaluation criteria for the goodness of the model. If d denotes the length of $δ$ , the true coefficient vector of interest, then the RMSE is defined by

RMSE (\hat{δ}) = \sqrt{\frac{1}{d} \sum_{r = 1}^{d} {‖ {\hat{δ}}_{r} - δ_{r} ‖}^{2}},

(9)

where $\hat{δ}$ is the coefficient vector estimated through our model. Lower values of RMSE indicate better performance of our model. In the following subsection, we are going to report these three accuracy measures for different simulation setups.

4.3. Simulation results

The results obtained from implementing our methodology on simulated data under different DGPs are presented in Table 4. Recall that the Rand index quantifies the accuracy in estimating changepoints, AIC provides a measure of goodness of the fitted model, while the RMSE helps us understand the accuracy in estimating the regressor effects.

Table 4.

Performance of the proposed model, in terms of Rand index (accuracy in estimating changepoints), AIC (goodness of model fit) and RMSE (accuracy in estimating coefficients), under different DGPs.

			n = 500			n = 1000
DGP	k	$RI$	AIC	RMSE	$RI$	AIC	RMSE
	1	0.831	2.849	0.340	0.886	2.853	0.501
White noise	3	0.785	2.877	0.733	0.797	2.867	0.762
	5	0.747	2.903	0.971	0.833	2.876	0.980
AR	1	0.717	2.599	0.952	0.704	1.141	0.501
	3	0.752	1.169	0.756	0.757	1.156	0.647
	5	0.777	1.201	0.920	0.814	1.170	0.868
MA	1	0.839	1.163	0.240	0.889	1.159	0.260
	3	0.790	1.238	0.691	0.798	1.191	0.616
	5	0.765	1.293	0.882	0.832	1.220	0.839
ARIMA	1	0.840	1.158	0.222	0.839	1.155	0.222
	3	0.788	1.226	0.702	0.798	1.805	0.613
	5	0.749	1.296	0.962	0.837	1.210	0.894
ARCH	1	0.837	3.580	0.380	0.880	3.585	0.726
	3	0.798	3.602	0.738	0.784	3.596	0.904
	5	0.769	3.619	0.995	0.922	3.611	1.297
APARCH	1	0.845	2.476	0.313	0.899	2.477	0.498
	3	0.796	2.514	0.714	0.795	2.495	0.691
	5	0.759	2.545	0.946	0.831	2.505	0.976
ARMA-GARCH	1	0.848	−7.005	0.301	0.897	−7.778	0.305
	3	0.788	−5.709	0.762	0.806	−3.611	1.171
	5	0.756	−4.470	0.888	0.824	−5.777	1.265

Open in a new tab

Note: k is the true number of changepoints and n is the length of each series.

We observe that our proposed changepoint detection method works fairly well if the underlying data-generating structure is homoskedastic. Irrespective of the total number of true changepoints, the three structural breaks detected using our method under the two assumptions record a Rand index of at least 0.717. In most cases, our approach gives more accurate results if the true number of changepoints (k) is less than or equal to 3, than what it does for the case of k = 5. In case of the AR process, the results display an opposite pattern. Moreover, as expected, the overall performance of the changepoint detection method seems to improve as the sample size is increased.

On the other hand, the efficacy of the model does not change much in terms of AIC values if the size of the data is changed from 500 to 1000. Increasing the sample size mostly reflects similar results on the RMSE values as well. However, for both sample sizes 500 and 1000, the model mostly performs better if the true number of changepoints is lesser than 3, both in terms of AIC and RMSE value except for the case of the simple autoregressive process. Keeping the sample size and the true number of changepoints fixed, the efficacy improves as the complexity of the homoskedastic data-generating structure is increased, even though the performance of the changepoint detection method does not seem to change much by the same.

Moving on to the heteroskedastic DGPs, a noticeable improvement is observed in the model fits as well as in identifying structural breaks. In fact, the minimum accuracy of the detected changepoints increases to 0.759 as we make the underlying DGP heteroskedastic. However, there is a slight increase in the AIC and the RMSE values in the case of ARCH and APARCH processes. Among the heteroscedastic setups, a drastic boost in the AIC values is observed when the underlying DGP is ARMA-GARCH, which depicts that our methodology can capture the conditional heteroscedasticity structure when the true DGP has the same nature. Interestingly, there is no significant change in the RMSE values in such case, thereby implying that across all scenarios our method works equally well in estimating the regressor effect. Among the heteroscedastic DGPs there is no discernible difference in the AIC and RI values in both the cases of n = 500 and n = 1000. In fact, the RMSE values show a tendency of slightly increasing in the latter case. For a fixed sample size, as we have also observed for the homoscedastic case, the technique works better if the true number of changepoints is less than or equal to 3, although the performance is fairly well for other cases too.

Overall, our observation is that the proposed binary segmentation approach works quite well if the sample size is large with respect to the number of changepoints, irrespective of the true number of such points in the data. The procedure suggested in this article works equally well even if the true structure of the underlying DGP is different from that of ARMA(1,1)-eGARCH(1,1). Consequently, one may argue that the methodology can be utilized or extended with suitable modifications for similar applications in future.

5. Main results and discussions

5.1. Changepoints

As discussed in Section 3, we first determine the three most important changepoints using binary segmentation method assuming a minimum gap of 90 days between two consecutive changepoints. The changepoints found following this approach, for each state-keyword combination, are presented in Table 5.

Table 5.

Changepoints identified in all series in the four States.

		FL	NY	PA	IL
Negative	${CP}_{1}$	13th May, 2020	24th Apr, 2020	28th Apr, 2020	7th May, 2020
	${CP}_{2}$	12th Sep, 2020	25th Jul, 2020	28th Jul, 2020	8th Sep, 2020
	${CP}_{3}$	21st Dec, 2020	25th Oct, 2020	8th Nov, 2020	10th Dec, 2020
Positive	${CP}_{1}$	26th Apr, 2020	5th Feb, 2020	29th Apr, 2020	7th Jun, 2020
	${CP}_{2}$	5th Aug, 2020	26th Jul, 2020	31st Jul, 2020	25th Sep, 2020
	${CP}_{3}$	3rd Nov, 2020	30th Oct 2020	30th Nov, 2020	31st Dec, 2020
psychological	${CP}_{1}$	12th Feb, 2020	13th Apr, 2020	19th Apr, 2020	5th Apr, 2020
	${CP}_{2}$	14th Jun, 2020	30th Jul, 2020	22nd Aug, 2020	1st Aug, 2020
	${CP}_{3}$	14th Dec, 2020	30th Oct, 2020	27th Dec, 2020	30th Oct, 2020
Lifestyle	${CP}_{1}$	21st Feb, 2020	22nd Feb, 2020	5th March, 2020	23rd Feb, 2020
	${CP}_{2}$	22nd Jul, 2020	18th Aug, 2020	1st Jul, 2020	30th Jul, 2020
	${CP}_{3}$	10th Dec, 2020	31st Dec, 2020	16th Nov, 2020	17th Dec, 2020
Hygiene	${CP}_{1}$	28th Feb, 2020	9th Feb, 2020	30th May, 2020	9th Feb, 2020
	${CP}_{2}$	10th Sep, 2020	12th Jun, 2020	4th Sep, 2020	21st Jun, 2020
	${CP}_{3}$	22nd Dec, 2020	30th Oct, 2020	21st Dec, 2020	21st Oct, 2020
Vaccine	${CP}_{1}$	20th Feb, 2020	2nd May, 2020	4th May, 2020	10th Feb, 2020
	${CP}_{2}$	3rd Jul, 2020	31st Jul, 2020	31st Aug, 2020	27th Aug, 2020
	${CP}_{3}$	7th Dec, 2020	7th Dec, 2020	8th Dec, 2020	8th Dec, 2020
wfh	${CP}_{1}$	5th Apr, 2020	15th Feb, 2020	15th Feb, 2020	21st Feb, 2020
	${CP}_{2}$	9th Jul, 2020	18th Jun, 2020	15th Jun, 2020	24th Sep, 2020
	${CP}_{3}$	9th Oct, 2020	18th Sep, 2020	9th Dec, 2020	24th Dec, 2020
Parenting	${CP}_{1}$	1st Apr, 2020	18th Feb, 2020	29th Feb, 2020	2nd Feb, 2020
	${CP}_{2}$	4th Jul, 2020	18th Jun, 2020	9th Jul, 2020	24th Jun, 2020
	${CP}_{3}$	28th Oct, 2020	20th Sep, 2020	10th Dec, 2020	22nd Oct, 2020
Hobbies	${CP}_{1}$	26th Apr, 2020	29th Feb, 2020	2nd Feb, 2020	29th Feb, 2020
	${CP}_{2}$	26th Jul, 2020	31st May, 2020	9th Jul, 2020	31st May, 2020
	${CP}_{3}$	14th Nov, 2020	1st Oct, 2020	8th Oct, 2020	29th Aug, 2020

Open in a new tab

Note: the category ‘health and lifestyle’ is denoted by ‘lifestyle’.

It is evident that the three changepoints thus obtained are consistent across all state-keyword combinations. Even though the exact time of occurrence of the three waves of the outbreak has been a separate stream of research (see Refs [2,44,50]), we can still broadly conclude that the three changepoints found typically align well with the three waves of the pandemic in USA. The first change point falls mostly in the time range of February 2020 to May 2020. It is roughly the time when state-wide stay at home orders were issued, all non-life sustaining business operations and services, including schools, theaters, bars and restaurants were closed. The second changepoint is found to be between June 2020 and September 2020, and that is when reopening of restaurants, bars, theaters and other public gatherings were allowed, albeit with restricted capacity and mandatory COVID-19 guidelines. Finally, the third changepoint falls in the range of October 2020 to December 2020. Not only the state regulations on public gatherings were further relaxed, but the vaccination procedures also started in that phase. Moreover, in the last part of 2020, the delta-variant of the coronavirus, which was said to be around twice as contagious as compared to the previous variants [54], was found and was quickly spreading in other countries before it hit the USA as well. All these events are expected to impact the mental health and social lives of the public, and that is rightly captured by the changepoints in our analysis.

At this point, we may also take a note of the similarity of the changepoints across different states, and across keywords for the same state. For example, for each state, the ‘work-from-home’ related keywords and ‘parenting’ related keywords experience a changepoint only on a few days gap. One can hypothesize that these two aspects of people's lives are closely connected and that is reflected in the online search patterns as well. Similarly, for each state, the search patterns of the keywords related to the pandemic relevant practices observe structural breaks almost around the same time. From the additional graphs presented in the Supplementary Material, one may further note that the changepoints for the nine keywords within the same state are not too far from each other as well.

5.2. Estimation results

We move on to the estimation results for the ARMA(1,1)-eGARCH(1,1) models fitted to the series for each state-keyword combination. Recall that the exogenous regressors used in the model (see Equation (5)) are based on the above-mentioned changepoints and the death-rate. Since these covariates are of primary importance, we summarize their estimates and the standard errors in Table 6. The other model coefficients are provided for the readers' reference in the Supplementary Material. Akin to the previous sections, the four regressors are termed DR, ${CP}_{1}$ , ${CP}_{2}$ , and ${CP}_{3}$ in the table. We discuss the overall results corresponding to the three groups of keywords in the next three sections. Before that, a concise summary of the general findings is given below. We also point out that the suitability of our modeling structure and the choice of the error distribution are discussed in more depth in Appendix A of the Supplementary Material, while some additional results are presented in Appendix B.

Table 6.

Estimated coefficients (along with standard errors) for the exogenous regressors used in the model.

		FL	NY	PA	IL
Negative	DR	0.150 (0.370)	−0.078 (0.035)*	−0.223 (0.235)	−0.365 (0.282)
	${CP}_{1}$	−1.364 (1.933)	2.839 (0.977)*	8.276 (3.283)*	−2.072 (2.229)
	${CP}_{2}$	1.152 (1.630)	−2.573 (0.941)*	−3.964 (2.725)	1.775 (1.954)
	${CP}_{3}$	−4.816 (1.858) *	1.450 (0.702)*	6.639 (2.514)*	−2.251 (2.006)
Positive	DR	−0.342 (0.367)	−0.022 (0.059)	−0.142 (0.157)	−0.081 (0.154)
	${CP}_{1}$	6.656 (2.693)*	−0.686 (1.381)	2.444 (1.449)	6.986 (1.221)*
	${CP}_{2}$	4.027 (2.416)	4.150 (1.502)*	−4.063 (1.847)*	−2.410 (1.343)
	${CP}_{3}$	−6.098 (1.874)*	−3.479 (1.416)*	6.094 (1.952)*	3.888 (1.460)*
Psychological	DR	−0.116 (0.549)	−0.056 (0.049)	−0.254 (0.138)	−0.782 (0.268)*
	${CP}_{1}$	−4.793 (2.755)*	8.368 (1.355)*	4.736 (1.526)*	5.984 (1.453)*
	${CP}_{2}$	8.534 (2.613)*	−1.925 (2.857)	−4.899 (1.498)*	−1.537 (1.260)
	${CP}_{3}$	−2.767 (2.417)	−4.471 (2.228)*	7.608 (1.717)*	−0.490 (2.404)
Lifestyle	DR	−0.145 (0.357)	0.079 (0.059)	−0.056 (0.180)	−0.374 (0.222)
	${CP}_{1}$	8.789 (1.900)*	5.420 (1.668)*	6.486 (2.255)*	7.096 (2.062)*
	${CP}_{2}$	−2.583 (2.147)	−5.020 (1.571)*	−3.567 (2.125)	−2.923 (1.599)
	${CP}_{3}$	−3.841 (1.529)*	−0.944 (1.546)	1.643 (2.245)	−2.186 (1.761)
Hygiene	DR	−0.144 (0.471)	0.023 (0.125)	0.0002 (0.000)*	−0.543 (0.167)*
	${CP}_{1}$	13.211 (3.444)*	6.758 (5.182)*	−0.033 (0.024)	8.904 (1.725)*
	${CP}_{2}$	−5.919 (2.056)*	3.770 (3.807)	0.038 (0.024)	4.947 (1.345)*
	${CP}_{3}$	6.862 (2.666)*	−3.867 (1.619)*	−0.327 (0.101)*	−3.301 (1.208)*
Vaccine	DR	−1.831 (0.984)	0.179 (0.088)*	−0.914 (0.214)*	−1.225 (0.422)*
	${CP}_{1}$	5.958 (4.355)	−0.957 (2.586)	6.736 (2.494)*	8.446 (3.558)*
	${CP}_{2}$	6.991 (4.459)	1.774 (2.765)	−6.750 (2.253)*	−5.536 (3.486)
	${CP}_{3}$	22.229 (3.349)*	12.665 (2.519)*	24.831 (2.739)*	22.826 (4.156)*
wfh	DR	−0.601 (0.236)*	0.033 (0.034)	−0.252 (0.130)*	0.339 (0.543)
	${CP}_{1}$	14.268 (1.670)*	2.493 (1.039)*	13.471 (1.626)*	4.929 (2.152)*
	${CP}_{2}$	−2.300 (1.667)	−0.328 (1.237)	−6.517 (2.147)*	−8.693 (1.926)*
	${CP}_{3}$	−1.766 (1.389)*	−0.767 (1.168)	−3.479 (2.527)	−2.500 (1.774)
Parenting	DR	−0.167 (0.615)	−0.038 (0.128)	−0.114 (0.004)	0.002 (0.012)
	${CP}_{1}$	3.617 (2.989)	6.461 (3.723)*	10.850 (4.427)*	7.711 (3.382)*
	${CP}_{2}$	1.092 (4.116)	−6.758 (3.739)	−0.031 (0.024)	0.022 (0.082)
	${CP}_{3}$	−5.304 (3.106)	0.963 (2.179)	−0.033 (0.093)	−4.376 (2.698)
Hobbies	DR	0.515 (0.206)*	−0.054 (0.066)	0.190 (0.161)	−0.104 (0.189)
	${CP}_{1}$	2.391 (0.775)*	−0.372 (1.870)	−5.096 (1.820)*	−1.931 (2.414)
	${CP}_{2}$	−5.767 (1.185)*	2.797 (1.918)	6.721 (1.700)*	4.277 (2.185)*
	${CP}_{3}$	0.973 (0.061)	−4.190 (1.243)*	−4.303 (1.787)*	−4.095 (1.575)*

Open in a new tab

Notes: * is used to indicate significance at level 0.05. Note that the category ‘health and lifestyle’ is renamed to ‘lifestyle’.

We observe that for each series there is at least one changepoint that shows a significant coefficient, indicating that the internet search trends of the keywords considered here have certainly gone through a significant change at one phase in the pandemic or the other. This supports our belief that different phases of the pandemic impacted the social life and the mental health of people. In fact, we note that in most of the cases, the coefficient of the death-rate variable is not significant. A likely explanation is that the day-to-day severity of the pandemic in respective states have not impacted people instantly. Instead, the long duration of the pandemic, lockdown measures and other restrictions imposed by the governments have played a bigger role in the change of public mental health status and that happened in a phased way.

The results are not found to be similar for the four states, in case of most of the categories. This may have been caused due to the demographic, social and other factors involved. The reader may also find several research (see, for example, [20,27]) highlighting the fact that the pandemic has caused a significant increase in inequalities in the mental health status, both within the population as a whole and between demographic groups.

Another interesting observation is that the coefficients of death-rate are often significantly negative in the models corresponding to certain categories (e.g. negative, psychological, hygiene, vaccine) in one or more states. While this might give one a counter-intuitive impression that the prevalence of negative feelings in individuals decreases as the severity of the pandemic keeps rising, these results are consistent with several other researches, cf. Refs [38,41]. A possible explanation for this phenomenon is that in the proximity of life-threatening events, individuals may change their viewpoint on mortality and become more positive. Another rationale behind the decrease in negative feeling with the increase in severity of COVID-19 may also be due to an increased social cohesion as an adaptive coping strategy in such crises.

5.2.1. Discussions about the mental health determinants

This group deals with the categories ‘negative’, ‘positive’ and ‘psychological’. For the first category, the changepoints occur in slightly different times for the four states. New York and Pennsylvania show a surge in negativity towards the end of April of 2020, while the other two states show no significant changes during all three phases. In the second phase of the pandemic, the searches on negative keywords are found to decrease significantly for New York. Then, from early November, both Pennsylvania and New York again experience a further significant increase of negative feeling. As we already alluded to earlier, this can be due to the effect of the delta-variant and the associated surge in COVID-19 numbers. In fact, the second wave of the outbreak in the country started around this phase.

Quite expectedly, the results for the ‘positive’ category bear similar meaning. In New York, the positivity increased in the second phase, i.e. from late July of 2020. This might be due to the fact that New York had highest number of confirmed cases until July. As it started to decrease, the public mental health improved for the better. In contrast, it took a hit in the third phase, which aligns well with our previous finding as well. Florida and Pennsylvania, on the other hand, observed the first changepoint in the search patterns of this category in the months of March and April of 2020; and it shows a positive effect in the former. Similar phenomena happened in Illinois too, although the first phase starts there in June of 2020. Coming to the third phase of the pandemic, we notice that it started early in Florida and New York (refer to Table 5), and that shows a significant dip in positive feelings. Contrarily, in the other two states, the third phase is in December 2020 and beyond; and that period of the pandemic observed an improvement of the mental health in general.

In the psychological category, New York, Pennsylvania and Illinois experience a surge during early to mid-April of 2020, when the states went under full lockdown for the first time. In Florida, the surge is observed roughly around mid-June. Over the next phases, these behaviors decreased in all states, which may be attributed to the increased accessibility to the online support services during these times. For instance, Illinois launched $24 \times 7$ mental health hot-line Call4Calm to help people experiencing stress, anxiety or other mental health-related issues during isolation. Online counselling and therapy sessions were also made available widely by this time in these states. However, during the second wave of the pandemic, from December 2020, we see a significant increase in online searches on psychological aspects in Pennsylvania.

Interestingly though, of all the models in this group, only two series (negative searches in New York and psychological searches in Illinois) display significant coefficients of death-rate, both being negative. This may be due to the previously discussed reasons or due to a confounding effect of the other coefficients which are of higher magnitudes. Another conjecture is that with a decrease in death-rate, the regulations are relaxed and consequently people can avail increased access to mental health-related facilities, thereby justifying the increase in the searches of keywords within these categories. Overall, we can argue that the changes in the people's behaviors have been primarily affected by the length, rather than the severity, of the pandemic.

5.2.2. Discussions about the pandemic relevant practices

Recall that the ‘health and lifestyle’ category involved people's interest in immunity, yoga, meditation, etc. The estimated coefficients depict a consistent increase in these aspects for all states in the first phase, especially towards the end of February to early March of 2020. It does not observe a significant change thereafter for Florida, Illinois and Pennsylvania, but decreases systematically for New York over the next phases. Interestingly, towards the end of 2020, the decrease is observed to be well significant for Florida as well. This automatically suggests that the pandemic induced a strong interest for a healthy lifestyle for people, which stabilized over its later phases.

Searches of hygiene-related keywords observed a jump in the month of February 2020 in Florida, New York and Illinois. This result is consistent with Ref. [40], which exhibits a sharp growth in sales of hand sanitizers and wipes by around 5,678 percent as compared to the previous years. After a sharp and significant decrease around mid-September, in Florida, public interest in hygiene increased further during December of 2020, which might be associated to the outbreak of the delta-variant, especially since Florida has been one of the most affected states in USA in the later phases of the pandemic. In all the other states, the hygiene-related searches decreased towards the end of 2020, which can be a reaction to the vaccination drives. Intriguingly, in Pennsylvania, the hygiene-related searches are significantly positively correlated with the death-rate, indicating that the extremity of the infections caused an increase in people's interest in hygiene.

The results from the ‘vaccine’ category corroborate this finding too, as the coefficients for the third changepoint (which occurred consistently in the first week of December 2020 in all four states) are highly significant and positive. Meanwhile, during the first phase, roughly around February to May of 2020, we notice a significant surge in vaccine-related searches in Pennsylvania and Illinois. This might be an immediate reaction to the increasing number of cases and deaths due to COVID-19 in this phase. For New York, this is reflected in the death-rate coefficient, which signifies that the severity of COVID-19 made people more inquisitive about available vaccines and their effectiveness against this then-unknown virus.

5.2.3. Discussions about the daily activity and social well-being

The culture of work-from-home (wfh) has gained exceptional popularity since the pandemic. All the states considered in this study consistently show a significant increase of high magnitude in related searches from February to April of 2020. Over the next phase, it starts to decrease due to people getting more habituated with the culture. In fact, during the third phase, towards the end of 2020, there is a further drop in wfh-related interest in all the states, perhaps due to the reopening of schools and offices following the vaccination drives.

Akin to wfh, there has been a consistent surge in parenting-related searches in the first phase of the lockdown (mostly during the month of April) in all states. This can be explained by the sudden imposition of stay-at-home orders and the unavailability of child care services. Over the next phases though, these searches stabilize, and in fact they tend to decrease systematically. This observation substantiates the finding of Ref. [10] that the parent-child relationship actually improved during the pandemic.

Finally, we turn to the results for the hobbies-related searches. These series are found to encounter considerably different changepoints for different states. New York experiences its first change in people's behavior towards the end of February 2020, whereas Florida has it two months later. The coefficients show mixed results as well. In Florida, people's interest in hobbies increased in the first phase whereas it decreased significantly for people in Pennsylvania. In the second phase, the numbers escalate in Pennsylvania and Illinois, whereas Florida experiences the opposing behavior. Towards the end of the year, meanwhile, the interest in hobbies decrease significantly in most cases. We also note that for Florida, the interest in hobbies is significantly positively correlated with the death-rate, although the severity is not found to impact the same in other states.

6. Conclusion

In this study, we have made use of internet search volume data collected on a daily basis for a few selected states of USA to show that it can help us decipher the changes in people's behavior during a crisis like the coronavirus pandemic. Inspired by an extensive and appropriate empirical analysis, we develop changepoint-induced ARMA-eGARCH models to analyze the data. As illustrated in Appendix A of the Supplementary Material, the model not only explains the variability of the data in a better way, but also renders good predictive performance.

One of our key findings is that the general nature of the significant changes in people's behavior and interest in nine important aspects are consistent with each other across the states. To further assess the effect of these changes, we introduce these changepoints, along with the severity of the pandemic quantified by the death-rate, as exogenous regressors in the model. The results of these models furnish strong proof that in the first phase of the pandemic, as an immediate response to the outbreak and initial regulations, there was a surge in the searches of the keywords which are primarily related to negative feelings. This finding goes consistent with Ref. [31], who report a significant elevation in mental health issues including depression, anxiety, insomnia, etc. during the first three weeks of the lockdown and stay-at-home orders. However, during the later phases in the pandemic, in most cases we see the general interest to level off. One may argue that people tend to develop coping mechanisms against the pandemic or they might have already explored the resources available on the internet that could be of help. This goes in line with the findings of the other studies conducted to assess the public mental health status during the time of the pandemic, cf. Refs [3,25].

Another compelling discovery of the proposed approach is that people's behavior during the time of crisis tends to be mostly affected by the long duration rather than the severity of the pandemic. This prolonged nature of COVID-19 is in turn associated with long-term uncertainty and lack of financial security, which have previously been proven to be important factors behind declining public mental health status (see Refs [16,61] for reference).

We conclude this article with a succinct account of some limitations and future scopes of the paper. As alluded to before, the proposed technique tends to have poor performance in terms of prediction for series with high volatility. Such fluctuations in the public interest typically stem from the heterogeneity in the population of internet users. Therefore, an improvement over the recommended approach would be to include the changes in the ethnic composition, socio-economic status and other demographic factors of the internet users in the statistical model. Another immediate future direction of the present work is to combine the data from all states of the USA and gauge the similarities and dissimilarities of social and mental health through a unified statistical framework. This can be potentially achieved through multivariate time series techniques.

Supplementary Material

Supplemental Material

Click here for additional data file.^{(178.7KB, pdf)}

7. Data availability statement

COVID-19 data from the United States of America are used in this study. This dataset is publicly available in the GitHub repository maintained by Johns Hopkins CSSE (link: https://github.com/CSSEGISandData/COVID-19). Google Search Volume data is extracted from the Google Trends API using the pytrends library. All cleaned data and code used in this study can be made publicly available on request.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Adesina T., Estimating volatility persistence under a Brexit-vote structural break, Finance Res. Lett. 23 (2017), pp. 65–68. [Google Scholar]
2.Akdi Y., Emre Karamanoğlu Y., Ünlü K.D., and Baş C., Identifying the cycles in COVID-19 infection: the case of Turkey. J. Appl. Stat. 1–13, 2022. [DOI] [PMC free article] [PubMed]
3.Akhther N. and Sopory P., Seeking and Sharing Mental Health Information on Social Media During COVID-19: Role of Depression and Anxiety, Peer Support, and Health Benefits. Journal of Technology in Behavioral Science 1–16, 2022. [DOI] [PMC free article] [PubMed]
4.Alibudbud R., Googling “mental health” after mental health legislation and during the COVID-19 pandemic: an infodemiological study of public interest in mental health in the Philippines, J. Mental Health 31 (2022), pp. 568–575. [DOI] [PubMed] [Google Scholar]
5.Amerio A., Odone A., Aguglia A., Gianfredi V., Bellini L., Bucci D., Gaetti G., Capraro M., Salvati S., Serafini G., and Signorelli C., La casa de papel: A pandemic in a pandemic, J. Affect. Disord. 277 (2020), pp. 53. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Awijen H., Zaied Y.B., and Nguyen D.K., Covid-19 vaccination, fear and anxiety: Evidence from Google search trends, Social Sci. Med. 297 (2022), pp. 114820. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ayers J.W., Leas E.C., Johnson D.C., Poliak A., Althouse B.M., Dredze M., and Nobles A.L., Internet searches for acute anxiety during the early stages of the COVID-19 pandemic, JAMA Int. Med. 180 (2020), pp. 1706–1707.doi: 10.1001/jamainternmed.2020.3305. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Bai J., Estimating multiple breaks one at a time, Econ. Theory 13 (1997), pp. 315–352.doi: 10.1017/S0266466600005831. [DOI] [Google Scholar]
9.Bai Y., Lin C.C.., Lin C.Y.., Chen J.Y.., Chue C.M.., and Chou P., Survey of stress reactions among health care workers involved with the SARS outbreak, Psychiatr. Serv. 55 (2004), pp. 1055–1057.Retrieved from 10.1176/appi.ps.55.9.1055 (PMID: 15345768). [DOI] [PubMed] [Google Scholar]
10.Banks J., Fancourt D., and Xu X., Mental health and the COVID-19 pandemic. World Happiness Report 2021. Retrieved from https://worldhappiness.report/ed/2021/mental-health-and-the-covid-19-pandemic/ (Last accessed on March 20, 2022), 2021.
11.Chau S.W., Wong O.W., Ramakrishnan R., Chan S.S., Wong E.K., Li P.Y., Raymont V., Elliot K., Rathod S., Delanerolle G., and Phiri P., History for some or lesson for all? A systematic review and meta-analysis on the immediate and long-term mental health impact of the 2002–2003 severe acute respiratory syndrome (SARS) outbreak, BMC Public Health 21 (2021), pp. 1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Cho H. and Kirch C., Data segmentation algorithms: Univariate mean change and beyond. Econometrics and Statistics, 2021.
13.Clancy L., Goodman P., Sinclair H., and Dockery D.W., Effect of air-pollution control on death rates in Dublin, Ireland: an intervention study, The Lancet 360 (2002), pp. 1210–1214. [DOI] [PubMed] [Google Scholar]
14.Deb S., Analyzing airlines stock price volatility during COVID-19 pandemic through internet search data. Int. J. Fin. Econ., 2021.
15.Dietzel M.A., Braun N., and Schäfers W., Sentiment-based commercial real estate forecasting with Google search volume data, J. Prop. Invest. Fin. 32 (2014), pp. 540–569. [Google Scholar]
16.Elbarazi I., Saddik B., Grivna M., Aziz F., Elsori D., Stip E., and Bendak E., The impact of the COVID-19 “Infodemic” on well-Being: A cross-Sectional study, J. Multidiscipl. Healthcare 15 (2022), pp. 289. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Engle R.F., Wald, likelihood ratio, and Lagrange multiplier tests in econometrics, Handbook Econ. 2 (1984), pp. 775–826. [Google Scholar]
18.Fernandes B., Biswas U.N., Mansukhani R.T., Casarin A.V., and Essau C.A., The impact of COVID-19 lockdown on internet use and escapism in adolescents, Revista De Psicologia Clinica Con Ninos Y Adolescentes 7 (2020), pp. 59–65.doi: 10.21134/rpcna.2020.mon.2056. [DOI] [Google Scholar]
19.Fuller W.A., Introduction to Statistical Time Series, John Wiley & Sons, 2009. [Google Scholar]
20.Gianfredi V., Provenzano S., and Santangelo O.E., What can internet users' behaviours reveal about the mental health impacts of the COVID-19 pandemic? A systematic review, Public Health 198 (2021), pp. 44–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Gimbrone C., Rutherford C., Kandula S., Martínez-Alés G., Shaman J., Olfson M., and Keyes K.M., Associations between COVID-19 mobility restrictions and economic, mental health, and suicide-related concerns in the US using cellular phone GPS and Google search volume data, PloS One 16 (2021), pp. e0260931. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Giuntella O., Hyde K., Saccardo S., and Sadoff S., Lifestyle and mental health disruptions during COVID-19. Proceedings of the National Academy of Sciences 118(9),doi: 10.1073/pnas.20166321182021. [DOI] [PMC free article] [PubMed]
23.Guliyev H., Determining the spatial effects of COVID-19 using the spatial panel data model, Spat. Stat. 38 (2020), pp. 100443. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Halford E.A., Lake A.M., and Gould M.S., Google searches for suicide and suicide risk factors in the early stages of the COVID-19 pandemic, PloS One 15 (2020), pp. e0236777. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Helliwell J.F., Layard R., Sachs J., and De Neve J.E.., World Happiness Report 2021. New York: Sustainable Development Solutions Network. Retrieved from https://worldhappiness.report/ed/2021/ (last accessed on 12th May, 2020), 2021.
26.Hisler G.C. and Twenge J.M., Sleep characteristics of US adults before and during the COVID-19 pandemic. Social Science & Medicine 276. 10.1016/j.socscimed.2021.113849, 2021. [DOI] [PMC free article] [PubMed]
27.Hossain M.M., Tasnim S., Sultana A., Faizah F., Mazumder H., Zou L., and Ma P., Epidemiology of mental health problems in COVID-19: a review. F1000Research vol.9, 2020. [DOI] [PMC free article] [PubMed]
28.Hsieh K.Y.., Kao W.T.., Li D.J.., Lu W.C.., Tsai K.Y.., Chen W.J.., and Chou F.H.C.., Mental health in biological disasters: from SARS to COVID-19, Int. J. Soc. Psychiatry 67 (2021), pp. 576–586. [DOI] [PubMed] [Google Scholar]
29.Jarque C.M. and Bera A.K., Efficient tests for normality, homoscedasticity and serial independence of regression residuals, Econ. Lett. 6 (1980), pp. 255–259. [Google Scholar]
30.John A., Eyles E., Webb R.T., Okolie C., Schmidt L., Arensman E., Hawton K., O'Connor R.C., Kapur N., Moran P., and O'Neill S., The impact of the COVID-19 pandemic on self-harm and suicidal behaviour: update of living systematic review. F1000Research vol.9, 2020. [DOI] [PMC free article] [PubMed]
31.Killgore W.D., Cloonan S.A., Taylor E.C., and Dailey N.S., Mental health during the first weeks of the COVID-19 pandemic in the United States, Front. Psychiatry 12 (2021), pp. 535. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Lai K., Lee Y.X., Chen H., and Yu R., Research on web search behavior: How online query data inform social psychology, Cyberpsychol. Behav. Soc. Netw. 20 (2017), pp. 596–602. [DOI] [PubMed] [Google Scholar]
33.Lin Y.H., Chiang T.W., and Lin Y.L., Increased internet searches for insomnia as an indicator of global mental health during the COVID-19 pandemic: Multinational longitudinal study, J. Med. Int. Res. 22 (2020), pp. e22181. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Liu Y.C., Kuo R.L., and Shih S.R., COVID-19: The first documented coronavirus pandemic in history, Biomed. J. 43 (2020), pp. 328–333. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Ljung G. and Box G., On a Measure of Lack of Fit in Time Series Models. Biometrika 65. 10.1093/biomet/65.2.297, 1978. [DOI]
36.Masaeli N. and Farhadi H., Prevalence of internet-based addictive behaviors during COVID-19 pandemic: A systematic review, J. Addictive Dis. 39 (2021), pp. 468–488. [DOI] [PubMed] [Google Scholar]
37.Maunder R.G., Leszcz M., Savage D., Adam M.A., Peladeau N., Romano D., and Schulman R.B., Applying the lessons of SARS to pandemic influenza, Can. J. Public Health 99 (2008), pp. 486–488. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Misiak B., Szcześniak D., Koczanowicz L., and Rymaszewska J., The COVID-19 outbreak and Google searches: Is it really the time to worry about global mental health?, Brain Behav. Immunity 87 (2020), pp. 126. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Obaid H.S., Dheyab S.A., and Sabry S.S., The Impact of Data Pre-Processing Techniques and Dimensionality Reduction on the Accuracy of Machine Learning. In 2019 9th annual information technology, electromechanical engineering and microelectronics conference (iemecon) (p. 279-283). 10.1109/IEMECONX.2019.8877011, 2019. [DOI]
40.Ochwat D., Staying in Stock: How retailers and suppliers are managing the coronavirus crisis. Storebrands Magazine. Retrieved from https://issuu.com/ensembleiq/docs/sbr-0420?fr=sZmIzYTgwODQ2MQ (Last accessed on May 17, 2022), 2020.
41.Pedro A., Cowden R.G., de Filippis R., Jerotic S., Nahidi M., Ori D., Orsolini L., Nagendrappa S., da Costa M.P., Ransing R., and Saeed F., Associations of lockdown stringency and duration with Google searches for mental health terms during the COVID-19 pandemic: A nine-country study, J. Psychiatric Res. 150 (2022), pp. 237–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Rand W.M., Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc. 66 (1971), pp. 846–850. [Google Scholar]
43.Rawat S. and Deb S., A spatio-temporal statistical model to analyze COVID-19 spread in the USA. J. Appl. Stat. 1–20, 2021. [DOI] [PMC free article] [PubMed]
44.Renardy M., Eisenberg M., and Kirschner D., Predicting the second wave of COVID-19 in Washtenaw County, MI, J. Theor. Biol. 507 (2020), pp. 110461. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Robinson E., Sutin A.R., Daly M., and Jones A., A systematic review and meta-analysis of longitudinal cohort studies comparing mental health before versus during the COVID-19 pandemic in 2020, J. Affect. Disord. 296 (2022), pp. 567–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Rotter D., Doebler P., and Schmitz F., Interests, Motives, and Psychological Burdens in Times of Crisis and Lockdown: Google Trends Analysis to Inform Policy Makers. J. Med. Internet Res. 23(6). 10.2196/26385, 2021. [DOI] [PMC free article] [PubMed]
47.Scott A.J. and Knott M., A cluster analysis method for grouping means in the analysis of variance, Biometrics 30 (1974), pp. 507–512. [Google Scholar]
48.Silverio-Murillo A., Hoehn-Velasco L., and Tirado A.R., COVID-19 blues: Lockdowns and mental health-related google searches in Latin America, Soc. Sci. Med. 281 (2021), pp. 114040. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Skamnia E., Economou P., Bersimis S., Frouda M., Politis A., and Alexopoulos P., Hot spot identification method based on Andrews curves: an application on the COVID-19 crisis effects on caregiver distress in neurocognitive disorder. J. Appl. Stat. 1–20, 2022. [DOI] [PMC free article] [PubMed]
50.Soldi G., Forti N., Gaglione D., Braca P., Millefiori L.M., Marano S., and Pattipati K.R., Quickest detection and forecast of pandemic outbreaks: Analysis of COVID-19 waves, IEEE Commun. Mag. 59 (2021), pp. 16–22. [Google Scholar]
51.Sprang G. and Silman M., Posttraumatic stress disorder in parents and youth after health-related disasters, Disaster Med. Public Health Prep. 7 (2013), pp. 105–110. [DOI] [PubMed] [Google Scholar]
52.Sycińska-Dziarnowska M., Szyszka-Sommerfeld L., Kloda K., Simeone M., Wozniak K., and Spagnuolo G., Mental Health Interest and Its Prediction during the COVID-19 Pandemic Using Google Trends. Int. J. Environ. Res. Public Health 18(23). 10.3390/ijerph182312369, 2021. [DOI] [PMC free article] [PubMed]
53.Theodossiou P., Skewed generalized error distribution of financial assets and option pricing, Multinational Finance J. 19 (2015), pp. 223–266. [Google Scholar]
54.UNICEF. (2021). What you need to know about the Delta variant: Learn how to keep yourself and your family safe from the Delta variant of COVID-19. (Last accessed on 12th May, 2020), 2021.
55.Van Slyke A., The collapse of health care: the effects of COVID-19 on US Community health centers. Lerner Center for Public Health Promotion: Population Health Research Brief Series 18, 2020.
56.Villadsen A., Patalay P., and Bann D., Mental health in relation to changes in sleep, exercise, alcohol and diet during the COVID-19 pandemic: examination of four UK cohort studies. Psychological Medicine 1–10. 10.1017/S0033291721004657, 2021. [DOI] [PMC free article] [PubMed]
57.Vostrikova L.Y., Detecting “disorder” in multidimensional random processes. In Doklady akademii nauk (Vol. 259, pp. 270–274), 1981.
58.Wade M., Prime H., Johnson D., May S.S., Jenkins J.M., and Browne D.T., The disparate impact of COVID-19 on the mental health of female and male caregivers, Soc. Sci. Med. 275 (2021), pp. 113801.doi: 10.1016/j.socscimed.2021.113801. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Wen F., Xiao J., Huang C., and Xia X., Interaction between oil and US dollar exchange rate: nonlinear causality, time-varying influence and structural breaks in volatility, Appl. Econ. 50 (2018), pp. 319–334. [Google Scholar]
60.Wu P., Fang Y., Guan Z., Fan B., Kong J., Yao Z., Liu X., Fuller C.J, Susser E., Lu J., and Hoven C.W., The psychological impact of the SARS epidemic on hospital employees in China: exposure, risk perception, and altruistic acceptance of risk, Can. J. Psychiatry 54 (2009), pp. 302–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Yenerall J. and Jensen K., Food security, financial resources, and mental health: Evidence during the COVID-19 pandemic, Nutrients 14 (2021), pp. 161. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Zitting K.M., Lammers-van der Holst H.M., Yuan R.K., Wang W., Quan S.F., and Duffy J.F., Google trends reveals increases in internet searches for insomnia during the 2019 coronavirus disease (COVID-19) global pandemic, J. Clin. Sleep Med. 17 (2021), pp. 177–184.doi: 10.5664/jcsm.8810. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

Click here for additional data file.^{(178.7KB, pdf)}

Data Availability Statement

[CIT0001] 1.Adesina T., Estimating volatility persistence under a Brexit-vote structural break, Finance Res. Lett. 23 (2017), pp. 65–68. [Google Scholar]

[CIT0002] 2.Akdi Y., Emre Karamanoğlu Y., Ünlü K.D., and Baş C., Identifying the cycles in COVID-19 infection: the case of Turkey. J. Appl. Stat. 1–13, 2022. [DOI] [PMC free article] [PubMed]

[CIT0003] 3.Akhther N. and Sopory P., Seeking and Sharing Mental Health Information on Social Media During COVID-19: Role of Depression and Anxiety, Peer Support, and Health Benefits. Journal of Technology in Behavioral Science 1–16, 2022. [DOI] [PMC free article] [PubMed]

[CIT0004] 4.Alibudbud R., Googling “mental health” after mental health legislation and during the COVID-19 pandemic: an infodemiological study of public interest in mental health in the Philippines, J. Mental Health 31 (2022), pp. 568–575. [DOI] [PubMed] [Google Scholar]

[CIT0005] 5.Amerio A., Odone A., Aguglia A., Gianfredi V., Bellini L., Bucci D., Gaetti G., Capraro M., Salvati S., Serafini G., and Signorelli C., La casa de papel: A pandemic in a pandemic, J. Affect. Disord. 277 (2020), pp. 53. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0006] 6.Awijen H., Zaied Y.B., and Nguyen D.K., Covid-19 vaccination, fear and anxiety: Evidence from Google search trends, Social Sci. Med. 297 (2022), pp. 114820. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0007] 7.Ayers J.W., Leas E.C., Johnson D.C., Poliak A., Althouse B.M., Dredze M., and Nobles A.L., Internet searches for acute anxiety during the early stages of the COVID-19 pandemic, JAMA Int. Med. 180 (2020), pp. 1706–1707.doi: 10.1001/jamainternmed.2020.3305. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0008] 8.Bai J., Estimating multiple breaks one at a time, Econ. Theory 13 (1997), pp. 315–352.doi: 10.1017/S0266466600005831. [DOI] [Google Scholar]

[CIT0009] 9.Bai Y., Lin C.C.., Lin C.Y.., Chen J.Y.., Chue C.M.., and Chou P., Survey of stress reactions among health care workers involved with the SARS outbreak, Psychiatr. Serv. 55 (2004), pp. 1055–1057.Retrieved from 10.1176/appi.ps.55.9.1055 (PMID: 15345768). [DOI] [PubMed] [Google Scholar]

[CIT0010] 10.Banks J., Fancourt D., and Xu X., Mental health and the COVID-19 pandemic. World Happiness Report 2021. Retrieved from https://worldhappiness.report/ed/2021/mental-health-and-the-covid-19-pandemic/ (Last accessed on March 20, 2022), 2021.

[CIT0011] 11.Chau S.W., Wong O.W., Ramakrishnan R., Chan S.S., Wong E.K., Li P.Y., Raymont V., Elliot K., Rathod S., Delanerolle G., and Phiri P., History for some or lesson for all? A systematic review and meta-analysis on the immediate and long-term mental health impact of the 2002–2003 severe acute respiratory syndrome (SARS) outbreak, BMC Public Health 21 (2021), pp. 1–23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0012] 12.Cho H. and Kirch C., Data segmentation algorithms: Univariate mean change and beyond. Econometrics and Statistics, 2021.

[CIT0013] 13.Clancy L., Goodman P., Sinclair H., and Dockery D.W., Effect of air-pollution control on death rates in Dublin, Ireland: an intervention study, The Lancet 360 (2002), pp. 1210–1214. [DOI] [PubMed] [Google Scholar]

[CIT0014] 14.Deb S., Analyzing airlines stock price volatility during COVID-19 pandemic through internet search data. Int. J. Fin. Econ., 2021.

[CIT0015] 15.Dietzel M.A., Braun N., and Schäfers W., Sentiment-based commercial real estate forecasting with Google search volume data, J. Prop. Invest. Fin. 32 (2014), pp. 540–569. [Google Scholar]

[CIT0016] 16.Elbarazi I., Saddik B., Grivna M., Aziz F., Elsori D., Stip E., and Bendak E., The impact of the COVID-19 “Infodemic” on well-Being: A cross-Sectional study, J. Multidiscipl. Healthcare 15 (2022), pp. 289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0017] 17.Engle R.F., Wald, likelihood ratio, and Lagrange multiplier tests in econometrics, Handbook Econ. 2 (1984), pp. 775–826. [Google Scholar]

[CIT0018] 18.Fernandes B., Biswas U.N., Mansukhani R.T., Casarin A.V., and Essau C.A., The impact of COVID-19 lockdown on internet use and escapism in adolescents, Revista De Psicologia Clinica Con Ninos Y Adolescentes 7 (2020), pp. 59–65.doi: 10.21134/rpcna.2020.mon.2056. [DOI] [Google Scholar]

[CIT0019] 19.Fuller W.A., Introduction to Statistical Time Series, John Wiley & Sons, 2009. [Google Scholar]

[CIT0020] 20.Gianfredi V., Provenzano S., and Santangelo O.E., What can internet users' behaviours reveal about the mental health impacts of the COVID-19 pandemic? A systematic review, Public Health 198 (2021), pp. 44–52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0021] 21.Gimbrone C., Rutherford C., Kandula S., Martínez-Alés G., Shaman J., Olfson M., and Keyes K.M., Associations between COVID-19 mobility restrictions and economic, mental health, and suicide-related concerns in the US using cellular phone GPS and Google search volume data, PloS One 16 (2021), pp. e0260931. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0022] 22.Giuntella O., Hyde K., Saccardo S., and Sadoff S., Lifestyle and mental health disruptions during COVID-19. Proceedings of the National Academy of Sciences 118(9),doi: 10.1073/pnas.20166321182021. [DOI] [PMC free article] [PubMed]

[CIT0023] 23.Guliyev H., Determining the spatial effects of COVID-19 using the spatial panel data model, Spat. Stat. 38 (2020), pp. 100443. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0024] 24.Halford E.A., Lake A.M., and Gould M.S., Google searches for suicide and suicide risk factors in the early stages of the COVID-19 pandemic, PloS One 15 (2020), pp. e0236777. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0025] 25.Helliwell J.F., Layard R., Sachs J., and De Neve J.E.., World Happiness Report 2021. New York: Sustainable Development Solutions Network. Retrieved from https://worldhappiness.report/ed/2021/ (last accessed on 12th May, 2020), 2021.

[CIT0026] 26.Hisler G.C. and Twenge J.M., Sleep characteristics of US adults before and during the COVID-19 pandemic. Social Science & Medicine 276. 10.1016/j.socscimed.2021.113849, 2021. [DOI] [PMC free article] [PubMed]

[CIT0027] 27.Hossain M.M., Tasnim S., Sultana A., Faizah F., Mazumder H., Zou L., and Ma P., Epidemiology of mental health problems in COVID-19: a review. F1000Research vol.9, 2020. [DOI] [PMC free article] [PubMed]

[CIT0028] 28.Hsieh K.Y.., Kao W.T.., Li D.J.., Lu W.C.., Tsai K.Y.., Chen W.J.., and Chou F.H.C.., Mental health in biological disasters: from SARS to COVID-19, Int. J. Soc. Psychiatry 67 (2021), pp. 576–586. [DOI] [PubMed] [Google Scholar]

[CIT0029] 29.Jarque C.M. and Bera A.K., Efficient tests for normality, homoscedasticity and serial independence of regression residuals, Econ. Lett. 6 (1980), pp. 255–259. [Google Scholar]

[CIT0030] 30.John A., Eyles E., Webb R.T., Okolie C., Schmidt L., Arensman E., Hawton K., O'Connor R.C., Kapur N., Moran P., and O'Neill S., The impact of the COVID-19 pandemic on self-harm and suicidal behaviour: update of living systematic review. F1000Research vol.9, 2020. [DOI] [PMC free article] [PubMed]

[CIT0031] 31.Killgore W.D., Cloonan S.A., Taylor E.C., and Dailey N.S., Mental health during the first weeks of the COVID-19 pandemic in the United States, Front. Psychiatry 12 (2021), pp. 535. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0032] 32.Lai K., Lee Y.X., Chen H., and Yu R., Research on web search behavior: How online query data inform social psychology, Cyberpsychol. Behav. Soc. Netw. 20 (2017), pp. 596–602. [DOI] [PubMed] [Google Scholar]

[CIT0033] 33.Lin Y.H., Chiang T.W., and Lin Y.L., Increased internet searches for insomnia as an indicator of global mental health during the COVID-19 pandemic: Multinational longitudinal study, J. Med. Int. Res. 22 (2020), pp. e22181. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0034] 34.Liu Y.C., Kuo R.L., and Shih S.R., COVID-19: The first documented coronavirus pandemic in history, Biomed. J. 43 (2020), pp. 328–333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0035] 35.Ljung G. and Box G., On a Measure of Lack of Fit in Time Series Models. Biometrika 65. 10.1093/biomet/65.2.297, 1978. [DOI]

[CIT0036] 36.Masaeli N. and Farhadi H., Prevalence of internet-based addictive behaviors during COVID-19 pandemic: A systematic review, J. Addictive Dis. 39 (2021), pp. 468–488. [DOI] [PubMed] [Google Scholar]

[CIT0037] 37.Maunder R.G., Leszcz M., Savage D., Adam M.A., Peladeau N., Romano D., and Schulman R.B., Applying the lessons of SARS to pandemic influenza, Can. J. Public Health 99 (2008), pp. 486–488. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0038] 38.Misiak B., Szcześniak D., Koczanowicz L., and Rymaszewska J., The COVID-19 outbreak and Google searches: Is it really the time to worry about global mental health?, Brain Behav. Immunity 87 (2020), pp. 126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0039] 39.Obaid H.S., Dheyab S.A., and Sabry S.S., The Impact of Data Pre-Processing Techniques and Dimensionality Reduction on the Accuracy of Machine Learning. In 2019 9th annual information technology, electromechanical engineering and microelectronics conference (iemecon) (p. 279-283). 10.1109/IEMECONX.2019.8877011, 2019. [DOI]

[CIT0040] 40.Ochwat D., Staying in Stock: How retailers and suppliers are managing the coronavirus crisis. Storebrands Magazine. Retrieved from https://issuu.com/ensembleiq/docs/sbr-0420?fr=sZmIzYTgwODQ2MQ (Last accessed on May 17, 2022), 2020.

[CIT0041] 41.Pedro A., Cowden R.G., de Filippis R., Jerotic S., Nahidi M., Ori D., Orsolini L., Nagendrappa S., da Costa M.P., Ransing R., and Saeed F., Associations of lockdown stringency and duration with Google searches for mental health terms during the COVID-19 pandemic: A nine-country study, J. Psychiatric Res. 150 (2022), pp. 237–245. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0042] 42.Rand W.M., Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc. 66 (1971), pp. 846–850. [Google Scholar]

[CIT0043] 43.Rawat S. and Deb S., A spatio-temporal statistical model to analyze COVID-19 spread in the USA. J. Appl. Stat. 1–20, 2021. [DOI] [PMC free article] [PubMed]

[CIT0044] 44.Renardy M., Eisenberg M., and Kirschner D., Predicting the second wave of COVID-19 in Washtenaw County, MI, J. Theor. Biol. 507 (2020), pp. 110461. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0045] 45.Robinson E., Sutin A.R., Daly M., and Jones A., A systematic review and meta-analysis of longitudinal cohort studies comparing mental health before versus during the COVID-19 pandemic in 2020, J. Affect. Disord. 296 (2022), pp. 567–576. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0046] 46.Rotter D., Doebler P., and Schmitz F., Interests, Motives, and Psychological Burdens in Times of Crisis and Lockdown: Google Trends Analysis to Inform Policy Makers. J. Med. Internet Res. 23(6). 10.2196/26385, 2021. [DOI] [PMC free article] [PubMed]

[CIT0047] 47.Scott A.J. and Knott M., A cluster analysis method for grouping means in the analysis of variance, Biometrics 30 (1974), pp. 507–512. [Google Scholar]

[CIT0048] 48.Silverio-Murillo A., Hoehn-Velasco L., and Tirado A.R., COVID-19 blues: Lockdowns and mental health-related google searches in Latin America, Soc. Sci. Med. 281 (2021), pp. 114040. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0049] 49.Skamnia E., Economou P., Bersimis S., Frouda M., Politis A., and Alexopoulos P., Hot spot identification method based on Andrews curves: an application on the COVID-19 crisis effects on caregiver distress in neurocognitive disorder. J. Appl. Stat. 1–20, 2022. [DOI] [PMC free article] [PubMed]

[CIT0050] 50.Soldi G., Forti N., Gaglione D., Braca P., Millefiori L.M., Marano S., and Pattipati K.R., Quickest detection and forecast of pandemic outbreaks: Analysis of COVID-19 waves, IEEE Commun. Mag. 59 (2021), pp. 16–22. [Google Scholar]

[CIT0051] 51.Sprang G. and Silman M., Posttraumatic stress disorder in parents and youth after health-related disasters, Disaster Med. Public Health Prep. 7 (2013), pp. 105–110. [DOI] [PubMed] [Google Scholar]

[CIT0052] 52.Sycińska-Dziarnowska M., Szyszka-Sommerfeld L., Kloda K., Simeone M., Wozniak K., and Spagnuolo G., Mental Health Interest and Its Prediction during the COVID-19 Pandemic Using Google Trends. Int. J. Environ. Res. Public Health 18(23). 10.3390/ijerph182312369, 2021. [DOI] [PMC free article] [PubMed]

[CIT0053] 53.Theodossiou P., Skewed generalized error distribution of financial assets and option pricing, Multinational Finance J. 19 (2015), pp. 223–266. [Google Scholar]

[CIT0054] 54.UNICEF. (2021). What you need to know about the Delta variant: Learn how to keep yourself and your family safe from the Delta variant of COVID-19. (Last accessed on 12th May, 2020), 2021.

[CIT0055] 55.Van Slyke A., The collapse of health care: the effects of COVID-19 on US Community health centers. Lerner Center for Public Health Promotion: Population Health Research Brief Series 18, 2020.

[CIT0056] 56.Villadsen A., Patalay P., and Bann D., Mental health in relation to changes in sleep, exercise, alcohol and diet during the COVID-19 pandemic: examination of four UK cohort studies. Psychological Medicine 1–10. 10.1017/S0033291721004657, 2021. [DOI] [PMC free article] [PubMed]

[CIT0057] 57.Vostrikova L.Y., Detecting “disorder” in multidimensional random processes. In Doklady akademii nauk (Vol. 259, pp. 270–274), 1981.

[CIT0058] 58.Wade M., Prime H., Johnson D., May S.S., Jenkins J.M., and Browne D.T., The disparate impact of COVID-19 on the mental health of female and male caregivers, Soc. Sci. Med. 275 (2021), pp. 113801.doi: 10.1016/j.socscimed.2021.113801. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0059] 59.Wen F., Xiao J., Huang C., and Xia X., Interaction between oil and US dollar exchange rate: nonlinear causality, time-varying influence and structural breaks in volatility, Appl. Econ. 50 (2018), pp. 319–334. [Google Scholar]

[CIT0060] 60.Wu P., Fang Y., Guan Z., Fan B., Kong J., Yao Z., Liu X., Fuller C.J, Susser E., Lu J., and Hoven C.W., The psychological impact of the SARS epidemic on hospital employees in China: exposure, risk perception, and altruistic acceptance of risk, Can. J. Psychiatry 54 (2009), pp. 302–311. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0061] 61.Yenerall J. and Jensen K., Food security, financial resources, and mental health: Evidence during the COVID-19 pandemic, Nutrients 14 (2021), pp. 161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0062] 62.Zitting K.M., Lammers-van der Holst H.M., Yuan R.K., Wang W., Quan S.F., and Duffy J.F., Google trends reveals increases in internet searches for insomnia during the 2019 coronavirus disease (COVID-19) global pandemic, J. Clin. Sleep Med. 17 (2021), pp. 177–184.doi: 10.5664/jcsm.8810. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Impact of COVID-19 on public social life and mental health: a statistical study of google trends data from the USA

Archi Roy

Soudeep Deb

Divya Chakarwarti

Abstract

1. Introduction

1.1. Brief literature review

1.2. Our contribution

2. Data

Table 1.

3. Methodology

3.1. Empirical analysis

Table 2.

Figure 1.

Figure 2.

3.2. Statistical modeling

4. Simulation study

4.1. Simulation setups

Table 3.

4.2. Evaluation

4.3. Simulation results

Table 4.

5. Main results and discussions

5.1. Changepoints

Table 5.

5.2. Estimation results

Table 6.

5.2.1. Discussions about the mental health determinants

5.2.2. Discussions about the pandemic relevant practices

5.2.3. Discussions about the daily activity and social well-being

6. Conclusion

Supplementary Material

7. Data availability statement

Disclosure statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases