Abstract
The U.S. has taken multiple measures to contain the spread of COVID-19, including the implementation of lockdown orders and social distancing practices. Evaluating social distancing is critical since it reflects the risk of close human interactions. While questionnaire surveys or mobility data-based systems have provided valuable insights, social media data can contribute as an additional instrument to help monitor the risk of human interactions during the pandemic. For this reason, this study introduced a social media-based approach that quantifies the pro/anti-lockdown ratio as an indicator of the risk of human interactions. With the aid of natural language processing and machine learning techniques, this study classified the lockdown-related tweets and quantified the pro/anti-lockdown ratio for each state over time. The anti-lockdown ratio showed a moderate and negative correlation with the state-level social distancing index on a weekly basis, suggesting that people are more likely to travel out of the state where the higher anti-lockdown level is observed. The study further showed that the perception expressed on social media could reflect people's behaviors. The findings of the study are of significance for government agencies to assess the risk of close human interactions and to evaluate their policy effectiveness in the context of social distancing and lockdown.
Keywords: COVID-19, Social media, Social distancing, Lockdown, Text classification
1. Introduction
Coronavirus Disease 2019 (COVID-19) pandemic has resulted in 417 k deaths in the U.S. as of January 22, 2021 [1]. Current evidence suggests that COVID-19 spreads mainly through direct contact or in close proximity with the infected patients via respiratory droplets as they cough, talk, or sneeze [2]. To prevent the spread of COVID-19, the U.S. government has taken multiple measures, including implementing orders such as lockdown and social distancing [3]. Social distancing is an intervention to reduce close interactions among people, and it has been demonstrated as an effective way to reduce the transmission of COVID-19 [4]. Evaluating the risk of close human interactions is critical for the U.S. government agencies as it helps to understand the public compliance with these essential measures and evaluate the effectiveness of these orders to reduce the spread of COVID-19.
Researchers have evaluated the risk of human interactions based on questionnaire surveys or mobile data. Questionnaire surveys investigated people's perceptions, behaviors, and plans during the COVID-19 pandemic by asking the respondents whether they are self-quarantining, keeping social distancing, eating out, or visiting relatives and friends [5]. However, collecting the surveys from the respondents is often time-consuming [6]. Other researchers leveraged mobile location data to establish an interactive monitoring system with advanced data processing techniques [7,8]. These mobile data-based systems show the likelihood of viral transmission and provide tools for tracking human interactions with quantitative parameters, such as social distancing indices. However, acquiring the mobile location data often relies on third-party institutions and data distributors, which adds to the cost of implementation.
On social media platforms, people express their opinions and concerns about the COVID-19 government measures such as stay-at-home orders, and these can be used to reveal as well as to predict the public reaction to the measures [[9], [10], [11]]. For example, people who express their fear and anxiety about the pandemic may have a tendency to follow the lockdown measures, whereas people who post that the measures are merely a political play and have a devastating impact on the economy may not follow them. Such risk perception shown on social media can reflect individuals' behaviors. It has been indicated that the geographic regions with many social media users who opposed the lockdown policy had a higher frequency of close human interactions, thus increasing the risk of viral transmission [12].
Public and private sectors need the information of public responses to assess and manage the risk of close human interactions amid the COVID-19 pandemic. While survey-based analysis and mobile data-based systems have provided multiple benefits in tracking human interactions and indicating the risk of viral transmission, leveraging the opinions expressed on social media can offer an additional and novel resource of such information. While social media data are inherently imperfect, it provides geographically well-distributed information that reflects people's inclination to comply with the government measures. In particular, social media data are available in a large quantity almost instantaneously, which allows access to the most up-to-date public responses toward the lockdown measure. Moreover, social media data are cost- and time-efficient as they are easy to access and relatively cheap.
For these reasons, this study aims to develop a method that evaluates the state-level risk of human interactions by extracting the perceptions of Twitter users upon the lockdown measures. This study applies natural language processing techniques and machine learning classifiers to identify users' perceptions. It then conducts a spatial-temporal analysis and a correlation analysis to investigate the associations of anti-lockdown perceptions with the case increase and social distancing index. The proposed approach can provide an additional resource to assess the state-level risk amid the COVID-19 pandemic that further benefits government agencies and health institutions.
2. Literature review
2.1. Social media in managing infectious disease
The extensive application of social media opens up useful insights for decision-makers to manage infectious diseases. One primary research branch focuses on discussing how social media can aid communication during a pandemic. A study conducted by Kim and Hawkins [13] revealed that social media could enhance public awareness and encourage health prevention behaviors. In particular, social media can promote preventive hygiene intention. Yang and Sun [14] investigated China's Health Code policy under the COVID-19 pandemic to explore why the public performs voice behavior on social media. Their study showed that the public voice plays a critical role in developing policies and represents cooperation between the government and the businesses to maintain social stability.
Another popular research field leverages social media data to signal early warning or conduct surveillance for infectious diseases [[15], [16], [17], [18]]. For example, Aramaki et al. [18] scraped millions of influenza-related tweets and applied machine learning classifiers to classify tweets into either influenza-related (positive) or -unrelated (negative). Their study showed a high correlation between the influenza epidemics and tweet data and thus concluded that Twitter texts could reflect the real-world infections. Ginsberg et al. [19] discussed how to use Google search queries to track influenza-like illness amid the epidemic. Similar to this study, a recent article validated the credibility of using internet searches and social media data to predict COVID-19 outbreak by displaying a strong correlation between search indices with subsequent infections [20].
2.2. Social media in understanding People's thoughts and perceptions
Social media postings contain a great deal of textual information that can reflect people's perceptions, thoughts, concerns, and mental health. As people's behaviors are in close connection with their thoughts, the information collected from social media may provide a useful resource for decision-makers to understand the risk of people's behaviors amid the health crisis. This hypothesis is built upon the existing body of knowledge called Cognitive Behavioral Theory (CBT). According to the CBT, how people think is closely connected with how they behave: thoughts determine how people feel, and it affects how they behave, and their behaviors ultimately impact the situation [21].
Multiple studies have provided empirical evidence to support this theory, especially the relationship between risk perceptions and behaviors. A particular study [22] suggested that the perceived risk can be an antecedent of a specific behavior. For example, risky drivers perceive driving risks as low, and those who perceive risks as high are less likely to undertake dangerous driving behavior. The relationship between the perceived risk and behaviors have been studied in education [23], business [[24], [25], [26]], and other sociology and psychology fields [[27], [28], [29]]. In the context of the COVID-19 pandemic, recent publications reported that high-risk perception could promote individuals' protective behaviors such as following social distancing guidelines, washing their hands, and wearing masks to reduce close contact with viruses [30,31]. Researchers have also demonstrated that people's traveling patterns were influenced by their risk perception [32,33].
Since the outbreak of COVID-19, many researchers have applied social media data to investigate individuals' perceptions, thoughts, concerns, behaviors, and mental health [[34], [35], [36], [37]]. For example, Johnson et al. [38] presented a cluster analysis of the contention surrounding vaccines against SARS-CoV-2. Their results revealed that anti-vaccination users on Twitter became highly entangled with undecided users in the main online network, whereas pro-vaccination users are more peripheral in the network. Li et al. [39] applied machine learning techniques to detect stress symptoms-related tweets and manifested that people's stress symptoms expressed on Twitter have a strong correlation with increasing cases. They also discovered that the main stressors switched from concerns on the increase of reported deaths to financial burdens and economic downturns. Zhao et al. [40] explored COVID-19 related topics on Sina Microblog in China from December 2019 to February 2020. They observed that the public attention initially focused on protection and first-aid and switched to medical services. Zhong et al. [36] examined the association between social media usage and individuals' mental health. Their results revealed that social media usage rewarded people who gained informational and emotional support as the virus struck but resulted in mental health issues when they were excessively used. Another study conducted by Hou et al. [41] inspected message data on Weibo, Baidu search engine, and Ali e-commerce and described how people's awareness and behavioral responses corresponded to the media news and government's announcement.
2.3. Methods in indicating the risk of human interactions
For a highly contagious virus like COVID-19, a rapid appraisal of risky behaviors (i.e., close human interactions) is critical to contain the disease. To the extent of our knowledge, there are two common approaches to reflect the risk of human interactions: questionnaire surveys and mobility-data-based systems.
Surveys have been used to collect the information by asking to what extent a respondent complies with social distancing and the related behaviors during the pandemic [[42], [43], [44]]. Typical questions often include how they interact with others (e.g., virtual or face to face), the frequency and the purposes of going out, and their experiences of being exposed to or tested for COVID-19 [[42], [43], [44]]. For example, researchers at Stanford University performed an online survey and asked questions about the respondent's health status and their actions [42]. The results showed that the majority of the respondents avoided social gatherings to reduce the risk of infections. Czeisler et al. [43] attempted to assess public compliance with stay-at-home orders through the investigation of more than 5,000 respondents, and more than 80% of the respondents followed the stay-at-home orders and stayed quarantined. The WalletHub's survey included questions, such as how the respondents feel about social distancing, how they cope during social distancing, what they wear out in public, how often they go outside, and how they feel about others not practicing social distancing [44]. Their key findings showed that most respondents could keep social distancing and 60% of them wore face masks.
The other popular method addresses this issue by establishing interactive monitoring systems based on spatial-temporal mobile data. The Maryland Transportation Institute developed a platform to visualize and supervise parameters reflecting the pandemic's impacts on mobility, health, and economy [7]. One of the most important daily updated parameters is the social distancing index, which implies the close-distance interactions. Similar to this social distancing index system, Unacast Company published a social distancing scoreboard to show the risk of close human interactions [8]. Its criteria include the changes in average distance traveled, visits to non-essential venues, and human encounters [45]. These social distancing scores range from A to F (A implies the lowest risk, while F implies the highest risk) to assess each state's risk level of following the social distancing policies [46].
2.4. Research objectives
Nevertheless, with the reviewed studies, there has been a lack of studies that utilized social media data to evaluate the risk of people's behaviors in the pandemic. While survey-based analysis and mobile data-based systems are useful to assess the risk of human interactions, accessing these data may require a vast amount of time and cost. While assessments based on social media may not be as precise as the survey results or mobile data-based instruments, they have advantages of rapidity, spatial coverage, and cost-effectiveness.
Therefore, this study aims to introduce a social media-based approach through the investigation of lockdown-related Twitter postings and quantifies the anti-lockdown ratio to reflect the risk of human interactions in each U.S. state. The rest of this study is structured as follows. Section 3 presents the data preparation and the development of the text classification pipelines. Section 3 also defines the pro/anti-lockdown ratio and introduces the social distancing index. Section 4.1 exhibits the pro/anti-lockdown ratio from temporal and spatial dimensions. Section 4.2 checks its correlation with the social distancing index and demonstrates the credibility of using social media data to evaluate the risk of human interactions. Section 5 discusses the findings, significance, implications, and limitations.
3. Methods
3.1. Data preparation
This study used Twitter Search API with the key search terms “lockdown” and “reopen” to download related tweets from April 21 to July 21, 2020. This scraping process generated 30,851,895 lockdown-related and 10,082,646 reopen-related records. Each record contains a user's profile information (e.g., username, description, registration location, friends, and followers), posted time, tweet content, and retweet information. Tweets used for the analysis include both original tweets and retweets. As retweets are of the same content as their original tweet, it was considered that a user who retweeted a tweet held the same perception as the user who originally posted it. This study selected these two opposite keywords, “lockdown” and “reopen,” to collect data given that they indicated a change of lockdown status, and both topics raised an extensive discussion in Twitter. Moreover, using this pair of opposite keywords could generate enough data and reduce data bias. While there were other terms to describe people's opinions regarding the lockdown, such as “stay at home,” “shut down,” “open up,” these phrases were not used as they might not be specific to the lockdown policy and could bring in much noise to the dataset.
Due to some system issues, the downloaded “reopen” dataset missed three days’ data files from June 3 to June 5, 2021. This study referred to an external dataset [47] (it used more than 90 keywords and hashtags to download COVID-19 pandemic related tweets) to deal with the missing data. All the records were stored in the JavaScript Object Notation (.json) format and converted to Excel (.xlsx) files. The records without the registration location information or not of the U.S. were removed first, and the tweets not containing the keywords “lockdown” or “reopen” were removed. These two steps reduced the size of lockdown dataset to 3,337,435 records and the size of reopen dataset to 3,480,777 records. As noted, some records could contain both “lockdown” and “reopen” words in a tweet, and thus the duplicated records were removed. As a result, the final dataset contained 6,774,678 records.
To build the training dataset for text classification, this study selected the top 5,000 most frequently occurring unique tweets from the lockdown dataset and the reopen dataset respectively and classified each of them into 1) class 1: the attitude opposing lockdown or supporting reopen, 2) class 0: the attitude is not clear from the tweet, and 3) class −1: the attitude supporting lockdown or opposing reopen. Labeling these tweets followed a polling process. Two authors first labeled each of the selected 10,000 tweets separately. If a tweet received the same label from both authors, the label was used for the tweet. Otherwise, the third author labeled the tweet, and the final label followed the majority of the three opinions. Table 1 presents some typical examples of class 1 and class −1 tweets.
Table 1.
Examples of “anti-lockdown” and “pro-lockdown” tweets.
| Attitude | Description | Tweet Example |
|---|---|---|
|
Class 1: Anti-lockdown (Support reopen or oppose lockdown) |
Express attitudes not supporting lockdown | I am not in favor of another nationwide Covid-19 lockdown! Period! Are you? |
| Concern that lockdown is ineffective for containing the pandemic | Japan has had less than 1000 Covid deaths. It is 12 times more densely populated than the US, and they have more elderly per capita than any other nation. They never did a complete lockdown. How did they do it? Virtually everyone wears a mask. So simple. We look ridiculous. | |
| Describe the negative impacts resulted from lockdown | The lockdown is on pace to have a more devastating impact on America than the virus itself —800% increase in suicide hotline calls —Increase in drug; alcohol dependency —Families separated in quarantine. Reopen America now! | |
| Consider lockdown as a political play | Stop playing politics with people's livelihoods. Reopen New Jersey. | |
|
Class -1: Pro-lockdown (Oppose reopen or support lockdown) |
Express attitudes supporting lockdown | This lockdown was a blessing in disguise, some people won't understand this though. |
| Concern about the consequences of the rush to reopen | When Texas rushed to reopen without adequate safety measures—and without meeting the federal standards set by the Administration—the state was going to unnecessarily increase the spread of Covid19. | |
| Recognize that the situation did not meet the requirement to reopen | Harvard researchers estimate we need 3X as many tests as we are doing now to safely reopen. About 500k people a day. States should have 152 tests per 100k people. No states at that level yet. | |
| Be aware of the potential threats resulting from reopen. | CDC documents warned Trump a full reopening of schools is ‘highest risk’ for the virus to spread. Trump pushed for schools to reopen anyway, because he doesn't care if our kids or teachers get sick and die. |
In the 5,000 selected lockdown-related samples, there were 2,266, 2,375 and 359 tweets labeled as class 1, class 0, and class −1, respectively. In the 5,000 selected reopen-related samples, there were 1,128, 2,162, and 1,710 tweets labeled as class 1, class 0, and class −1, respectively. To check the generalization ability of the trained model, this study randomly selected another 1,000 different and unique tweets from each dataset (no overlaps with the training dataset) and labeled them based on the same polling process. Among these 2,000 testing samples, there were 471, 1,184, and 345 samples labeled as class 1, class 0, and class −1, respectively. The research framework to build and implement the pipeline for text classification is presented in Fig. 1 .
Fig. 1.
The model framework to build and implement the model for text classification.
Before processing the textual data, this study applied several steps to clean the text, as presented in the “Text Cleaning” box (Fig. 1). Short URLs, @username, RT @username, digits, emojis, and punctuations in a tweet were removed, and then the stop-words which were not informative, such as “the,” “is,” and “and,” were stripped from the text. Next, each tweet was tokenized into a list of separate words and characters. As words in a tweet were written in different tenses or forms, the tokenized words were lemmatized to their stemming forms. This cleaning process was completed with the aid of the Natural Language Toolkit (NLTK) python library [48].
3.2. Text augmentation
It was noted that the percentages of class 1, class 0, and class −1 were not balanced in the training dataset. For example, the distribution in the lockdown training dataset was class 1 (2,262), class 0 (2,375), and class −1 (359). Such imbalance in the training dataset could result in a reduced prediction performance for the minority class. In other words, the trained model could achieve 92.7% accuracy ((2,262 + 2,375)/5,000) by predicting only class 1 and class 0 correctly, and thus it lacked the competence to identify class −1 tweets. This study utilized a text augmentation technique called Easy Data Augmentation (EDA) [49] to balance the training samples. The EDA technique does not require the NLP model to be pre-trained on any external dataset and has been demonstrated to enhance the performance for a small dataset effectively. The EDA model consists of four operations: 1) synonym replacement, 2) random insertion, 3) random swap, and 4) random deletion to increase the dataset volume [49], as presented in Table 2 .
Table 2.
Text augmentation operators and examples.
| Operator | Ratio | Description | Example |
|---|---|---|---|
| Original tweet: Despite clear evidence that the fatality rate is lower in states with less restrictive lockdown policies, liberals; their media allies are still pushing this “shut it all down” narrative. It's as if they want you bankrupted, depressed and jobless. Cleaned tweet: despite clear evidence fatality rate lower state less restrictive lockdown policy liberal medium ally still push shut down narrative want bankrupt depressed jobless | |||
| Synonym Replacement (SR) | 0.1 | Replace each of random n words in the sentence with one random selected of its synonyms. The synonym library is built on WordNet. | despite clear prove fatality rate lower state less restrictive lockdown policy liberal medium ally still pushexcludedown narrative want bankrupt depressed jobless |
| Random Insertion (RI) | 0.1 | Find a random synonym of a random word in the sentence. Insert that synonym into an arbitrary position in the sentence. Do this n times. | despite clear evidence fatality rate lower state less restrictive lockdown policy liberal medium allylabourstill push shut down narrative want bankrupt depressedopenhandedjobless |
| Random Swap (RS) | 0.1 | Choose two random words in the sentence and swap their positions. Do this n times. | despite clear evidence fatality rate lower statejoblessnarrative lockdown policy liberal medium ally still push shut down restrictive want bankrupt depressedless |
| Random Deletion (RD) | 0.1 | Randomly remove each word in the sentence with a probability p, p = α. | despite clear evidence fatality rate lower state less restrictive policy liberal medium ally still push shut down narrative want bankrupt depressed |
This augmentation process enlarged the minor class −1 in lockdown training samples from 359 to 2,154, and the minor class 1 in reopen training samples from 1,128 to 2,256. As a result, there were 2,262 class 1, 2,375 class 0, and 2,154 class −1 "lockdown" training samples, and 2,256 class 1, 2,162 class 0, and 1,710 class −1 "reopen" training samples. In the final training set, there were 4,581 class 1 samples, 4,537 class 0 samples, and 3,864 class −1 samples. Following the recommendations by the original article, this study set the parameters for each of the four operations as α = 0.1, where α is a parameter that indicates the percent of the words in a sentence are changed [49]. Since tweets can vary in length, a longer tweet can absorb more noise while maintaining the original content than a shorter tweet. To compensate for this issue, this study set the number of words in a tweet to be changed as n = αl, where l is the length of a tweet [49]. For a tweet with fewer words, this setting ensured at least one word to be changed.
3.3. Text vectorization
Term Frequency-Inverse Document Frequency (TF-IDF) was used to vectorize tweets into features. TF-IDF is a term weighting method implemented in text similarity, text classification, and information retrieval. Although TF-IDF cannot capture semantic structure in a text, it is a useful algorithm to deal with a large set of texts due to its simplicity and fast computation retrieval [50]. In TF-IDF, TF measures the number of words and their frequencies on each of the documents, while IDF is incorporated to reduce the weights of common words in the corpus. The goal of using TF-IDF is to scale down the impact of words that occur more frequently but empirically less informative in a given training dataset retrieval [50]. The mathematical representation of the TF-IDF method is given below:
| (1) |
where represents the word ’s weight in tweet , denotes the frequency of word in tweet , is the total number of tweets, and is the number of tweets that word appears. Words with high TF-IDF weight generally indicate strong relationships between the tweets in which they appear. Compared to the Bag of Words method that simply counts the frequency of words, TF-IDF contains the information on the more important words and the less important ones as well. Although TF-IDF does not capture text positions or similarities in a document, it is an efficient and simple algorithm for matching words in a query to documents [51]. Due to its simplicity and fast computation, TF-IDF is beneficial when dealing with a large set of Twitter data. In this study, the scikit-learn python library [52] was applied to compute TF-IDF.
Unlike the TF-IDF method, word embedding techniques have the capability to capture the semantic meanings of words. These alternatives are generally applied to help compute text similarity and perform text classification by converting each word into a pre-trained vector of features. However, this study chose not to use word embedding methods given the following reasons. A word embedding method might treat those augmented data as the same semantic samples. For example, the original tweet was “please reopen America,” and the EDA derived a second tweet using synonym replacement, “please open America.” In this simple case, these two samples might be identical vectors of features according to the word embedding method, and thus the EDA technique might not essentially increase the training samples. As a result, the trained model could still generate a reduced prediction performance for the minor class. Moreover, a word embedding is a much more complicated representation of words and carries more hidden information, which might unnecessarily create more “noisy” patterns in this classification task and increase the difficulty to discriminate the meanings between tweet samples.
3.4. Multi-class classifiers
After each tweet was converted into a vector of features, six classifiers were selected to perform the multi-class classification: Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Neural Network (NN). Models using these classifiers combined with the TF-IDF technique were constructed based on the scikit-learn python library [52].
A DT is a decision support technique that uses a tree-like structure to represent data and make decisions. In a classification task, a DT classifier can classify a sample by using its leaves to denote class labels and branches to represent conjunctions of features that result in those class labels. A RF is an ensemble learning method that is built on a multitude of DTs [53]. Compared to a single DT, the RFs can reduce variance and control overfitting by training on various subsets of the datasets and outputting the class that is the average prediction of individual trees. The NB classifiers are a family of probabilistic classifiers built based on the Bayes theorem in statistics [54]. In this study, the Multinomial NB (MNB) was applied to perform the multi-class classification task. MNB implements the classification algorithms for data that are multinomially distributed and is suitable for classification with discrete features, such as words frequencies [55]. SVMs are supervised learning models pervasively applied in classification tasks. An SVM intends to apply the kernel functions to map the data in the lower dimensional space into a higher dimensional feature space where it can establish a hyperplane or a set of hyperplanes to search for the maximum margin to split the dataset [56]. LR is a machine learning technique that was originally designed for binary classification. In LR, the core uses a logistic function to model the conditional probability of the predicted label based on independent input variables [57]. In this study's multi-class case, the training process used a one-vs-rest scheme and applied the multinomial cross-entropy loss. Last, an NN is an interconnected network of neurons that applies mathematical activation functions for information processing. This study applied the Adam solver to update network weights. Adam is an algorithm for the first-order gradient-based optimization of stochastic objective functions [58].
3.5. Performance measurement
As the class distributions in the testing set were not balanced, the accuracy itself might not reflect the model's performance on each class. In addition to testing accuracy, this study used precision, recall, and F1-score to estimate the classification performance. Precision measures the fraction of true positive cases over the retrieved cases that a model predicts, while recall is the fraction of true positive cases over all the relevant cases that are actually identified. The F1-score is a rating of test accuracy, representing s a combination of recall and precision [59]. The mathematical formulas of precision, recall, F1-score, and accuracy are presented from formula 2 to formula 5, in which TP = True Positive, TN = True Negative, FP = False Negative, and FN = False Negative. Each model's performance on the testing set is exhibited in Table 3 .
| (2) |
| (3) |
| (4) |
| (5) |
Table 3.
Performance of multiple classifiers on the n = 2,000 testing samples.
| TF-IDF + DT | TF-IDF + RF |
TF-IDF + MNB | TF-IDF + SVM | TF-IDF + LR | TF-IDF + NN | |
|---|---|---|---|---|---|---|
| Precision | ||||||
| Class 1 | 0.68 | 0.80 | 0.81 | 0.79 | 0.72 | 0.73 |
| Class 0 | 0.35 | 0.56 | 0.43 | 0.44 | 0.39 | 0.40 |
| Class -1 | 0.33 | 0.84 | 0.44 | 0.48 | 0.38 | 0.43 |
| Recall | ||||||
| Class 1 | 0.55 | 0.87 | 0.65 | 0.70 | 0.62 | 0.67 |
| Class 0 | 0.48 | 0.67 | 0.62 | 0.54 | 0.47 | 0.46 |
| Class -1 | 0.39 | 0.36 | 0.47 | 0.53 | 0.46 | 0.47 |
| F1-score | ||||||
| Class 1 | 0.61 | 0.83 | 0.72 | 0.74 | 0.67 | 0.70 |
| Class 0 | 0.40 | 0.61 | 0.51 | 0.48 | 0.43 | 0.43 |
| Class -1 | 0.35 | 0.50 | 0.46 | 0.50 | 0.42 | 0.45 |
| Training Accuracy | 99.18% | 97.88% | 80.39% | 89.89% | 99.13% | 99.13% |
| Testing Accuracy | 50.30% | 73.45% | 61.10% | 63.10% | 55.75% | 58.50% |
According to Table 3, the model trained based on TF-IDF + RF outperformed the other five models, demonstrated by a higher F1-score on every class and a higher testing accuracy. Overall, the F1-score on class -1 is lower than the other two classes, possibly because there are fewer class -1 training samples (some of the training samples were augmented using the EDA technique). It is possible that the variances of textual contents from anti-lockdown data could be smaller. That being said, there could be fewer and less diverse words used to describe anti-lockdown attitudes,such as “Reopen the American,” and “Reopen the States,” but more words used to describe the pro-lockdown possibly because those pro-lockdown tweets often listed specific reasons). Hence, the model presents a better capability to classify an anti-lockdown tweet. Models trained based on DT, LR, and NN show a severe overfitting problem as the results present a high training accuracy but a low testing accuracy. Therefore, this study selected and applied the TF-IDF + RF model to classify each tweet in the dataset.
Even within the 10,000 labeled samples, the variances inside a class are very high (e.g., tweets in class 0 appear to be very different), which increases the difficulty to train the model. Nevertheless, it is noted that the 10,000 manually labeled tweets constitute 53.3% (3,607,702 out of 6,774,678) of the whole dataset as retweets are of the same textual content of these labeled tweets. Labeling these 53.3% tweet data can obtain the same accuracy as the training accuracy. As a result, the expected accuracy on whole dataset could be 86.5% (97.9% (training) * 53.3% + 73.5% (testing) * 46.7%).
3.6. Pro- and anti-lockdown ratio and social distancing index
Social media are often criticized for polarized opinions, which means that there is a possibility that the collected data do not reflect the public opinions [60]. Thus, the distribution of opinions on the lockdown orders in various poll results were compared. From the poll data in Table 4 , it was discovered that most respondents of the polls were either on the side of supporting or opposing lockdown, and very few people were neutral [[61], [62], [63]]. Most of the collected tweets about the lockdown and reopen also showed an opinion either supporting or opposing lockdown, and there were very few tweets showing a neutral or unsure opinion. This indicates that the degree of polarization was not significantly different in the various platforms since people tended to choose a side on the matter of lockdown policy.
Table 4.
Poll results on lockdown and reopen policy.
| Polling organization | Poll period, responders, and sampling error margin | Oppose lockdown or support reopen (%) | Neither oppose nor support lockdown/reopen (%) | Support lockdown or oppose reopen (%) |
|---|---|---|---|---|
| Monmouth University [61] | April 30-May 4, 2020 808 adults 3.5% |
29 | 8 | 63 |
| Associated Press-NORC Center for Public Affairs Research [62] | April 16–20, 2020 1057 adults 4.0% |
43 | 1 | 56 |
| NPR-Ipsos survey [63] | July 30–31, 2020 1115 adults 3.3% |
36 | 5 | 59 |
In the next step, this study used the classified tweets to quantify the anti-lockdown ratio. The anti-lockdown ratio is calculated as a fraction of the number of “class 1” tweets divided by the sum of “class 1” and “class −1” tweets. The ratio of higher than 0.5 means that there are more anti-lockdown tweets than pro-lockdown tweets. The anti-lockdown ratio can also simply be converted to the pro-lockdown ratio since the sum of the anti- and pro-lockdown ratios is 1. The mathematical formula is presented below.
| (6) |
| (7) |
The anti-lockdown ratio indicates people's negative perceptions about the lockdown policy. In order to examine the relationship between the anti-lockdown ratio and the public's social distancing behaviors, this study obtained the social distancing index data from the COVID-19 Impact Analysis Platform (https://data.covid.umd.edu/) published by Maryland Transportation Institute [7]. According to its documentation, the social distancing index implies the chance of close-distance interactions. The index is computed based on Equation (8) using six mobility metrics: percentage of residents staying at home, the percentage reduction of all trips compared to the pre-COVID-19 benchmark, the percentage reduction of work trips, percentage reduction of non-work trips, percentage reduction of travel distance, and percentage reduction of out-of-country trips. The weight for each aspect was chosen based on the share of residents and visitor trips. The index ranges from zero to a hundred, and a higher number indicates that more residents stay at home while fewer visitors are entering the county or state. A smaller social distancing index implies a higher chance that people stay closer [7].
| (8) |
4. Results
4.1. Temporal and spatial analysis
For the temporal analysis, this study binned the data based on the number of classifications and then calculated the daily pro- and anti-lockdown ratios. Fig. 2 a presents the trends of the national anti-lockdown ratio and average social distancing index. Fig. 2b exhibits the number of states under the stay-at-home order [64]. Fig. 2c illustrates the relationship between the pro-lockdown ratio and the daily new positive cases [65]. Fig. 2d shows the daily tweet volume.
Fig. 2.
a. Trends of national anti-lockdown ratio and social distancing index. b. The number of states under the stay-at-home order [64]. c. Relation between the pro-lockdown ratio and the daily reported infections [65]. d. Daily volume of related tweets.
The average anti-lockdown ratio is 0.596, about 10–15% higher than the polling results (Table 4). A couple of reasons may explain such observation. First, it was noted that the trained model generated a lower F1-score for pro-lockdown than anti-lockdown (Table 3), possibly resulted from fewer pro-lockdown training samples and more diverse words used to describe the pro-lockdown attitude. As a consequence, the trained model has limited capability to identify pro-lockdown tweets. Second, while this study compensated the missing reopen data for June 3 to June 5 by referring to an external dataset, this dataset did not use the keyword “reopen” to download related tweets that might result in a partial loss of pro-lockdown data. Third, the ratio was computed based on the number of tweets rather than the number of accounts, and thus it was possible that accounts with the anti-lockdown attitude published more tweets than accounts with the pro-lockdown attitude. While social media is an inherently imperfect information source (e.g., “reopen” campaigns might use bots to fuel the discussion surrounding stay-at-home orders in the pandemic [66]), the goal of this section is to shed light upon the associations of anti-lockdown ratio with other metrics (e.g., case increase, social distancing index).
In Fig. 2a, the national social distancing index remained high at the beginning, implying that people were distancing during the lockdown period. As the states lifted the lockdown policy between late April and early June, the social distancing index presented a gradual decrement trend, implying a pattern of more frequent traveling behaviors. However, the social distancing index did not show a clear uptrend amid the second wave of the outbreak. Another interesting observation from Fig. 2a is that the social distancing index showed a periodic weekly pattern: it spiked at weekends and fell to lows on Friday. This observation manifests that people chose to travel outside mainly for work on the weekday and stay home for rest on the weekend. In addition, the trend of the anti-lockdown ratio gradually increased more or less from late April to early June. The most likely explanation is that although people might understand that the extension of lockdown was for their good to contain the infections, many lamented on social media how they missed doing ordinary things and hoped to end the lockdown.
According to Fig. 2c, the reported case number had dramatically increased since early-mid June, possibly due to the lift of state-at-home orders in most states. The pro-lockdown ratio grew in tandem with the case number, demonstrated by a moderate positive Pearson correlation (R = 0.59, p < 0.05). At this time, most Twitting postings argued that anti-lockdown protests contributed to the increasing infections and deaths, and the rush to reopen could expose people to the risk of viral transmission [67]. Hence, the anti-lockdown ratio correspondingly decreased in this period (Fig. 2b). This transitional pattern from anti-lockdown to pro-lockdown on social media demonstrated the associations of the anti-lockdown ratio with other metrics, such as the number of infections and deaths.
In addition, the lockdown policy raised an extensive discussion on Twitter during April and May (Fig. 2d). As many states started to lift the stay-at-home orders from mid-May (Fig. 2b), the discussion volume showed a decreased trend (Fig. 2d). However, the second wave of the COVID-19 outbreak that hit the U.S. around June 15 led to the resurgence of the discussion on Twitter regarding the necessity of lockdown orders. Although the majority were aware of the risk of infections, some online users were concerned that a second national wide lockdown might further hurt the chances of small business living through the pandemic.
To further understand the popular opinions in supportive of pro- or anti-lockdown, this study examined the most frequently occurring pro- and anti-lockdown tweets between April and May. The major opinions in supportive of pro-lockdown orders argued that 1) the lockdown order was a blessing that helped to contain the spreads of infections, 2) the country could not afford to risk a premature reopen since the curve was not flattening, 3) there lacked a robust testing and tracking program to guarantee a safe reopening, and 4) reopen campaigns used bot accounts to disseminate the information to support reopening the country. Conversely, the mainstream views in supportive of anti-lockdown defended that 1) the lockdown policy was not necessary to contain the pandemic, 2) the lockdown was an overreaction and a violation of constitutional rights, 3) an extended lockdown could cause irreversible damage to the economy and the society (e.g., bankruptcies, depressions, and suicides), and 4) it was a political play.
Another interesting observation for Fig. 2a is that the anti-lockdown ratio continued to increase in early June. After manual examination of the popular anti-lockdown tweets, it was found that this increase was associated with Black Lives Matter (BLM) movements that occurred in early June. People on Twitter argued that it was unfair to allow BLM protests but forbid other forms of protests and outside gatherings. For example, a tweet that was retweeted frequently stated that “we were forced to stay home, close businesses, stop our kids education, let loved ones die alone and Antifa/BLM can riot and ‘protest’ by the 1000's and its okay.” These people criticized the government for its unequal treatment of BLM protests and anti-lockdown protests. Such voice cast doubt on the necessity of lockdown orders on social media and shook the public trust on the government's measures.
In addition to the temporal analysis, this study conducted a spatial analysis to compare the anti-lockdown ratio across the states, as exhibited in Fig. 3 . The data were binned based on the user's registration location, and then the anti-lockdown ratio for each state was computed. For example, the following registration locations, “California, USA,” “Los Angeles,” “California,” and “Santa Monica CA,” all indicate California State. Further exploration regarding the state-level anti-lockdown ratio and other demographic factors is presented in the appendix.
Fig. 3.
Anti-lockdown ratio of the U.S. states (April 21 ~ July 21).
The mean anti-lockdown ratio (April 21 ~ July 21) ranged from the lowest of 50.9% to the highest of 70.3% for the states across the United States. The ten states with the highest anti-lockdown ratio are Alabama (AL, 70.3%), Mississippi (MS, 70.2%), Idaho (ID, 69.8%), Tennessee (TN, 69.6%), North Dakota (ND, 69.1%), West Virginia (WV, 0.68%), Nebraska (NE, 67.4%), Oklahoma (OK, 67.1%), Montana (MT, 66.9%), and South Dakota (SD, 66.3%). The ten states with the lowest anti-lockdown ratio or the highest pro-lockdown ratio include Vermont (VT, 50.9%), Maryland (MD, 53.8%), Oregon (OR, 54.6%), Washington (WA, 54.9%), Massachusetts (MA, 55.8%), District of Columbia (DC, 55.8%), New York (NY, 57.0%), Connecticut (CT, 57.3%), California (CA, 57.4%), and Hawaii (HI, 58.0%). Overall, the states in the North West, Middle West and South East present a higher anti-lockdown ratio, while the states in the West Coast and North East Coast report a lower anti-lockdown ratio.
4.2. Correlation analysis
To demonstrate that the pro/anti-lockdown ratio can reflect people's traveling behaviors and help evaluate the risk of human interactions, this study performed a correlation analysis between the social distancing index and the pro- and anti-lockdown ratios. As previously explained (Section 3.6), the social distancing index is built on the mobile location data and shows people's traveling behaviors in the COVID-19 pandemic. The correlation coefficient is computed using the mathematical formula below.
| (10) |
where equals to 51 in this study, indicating the 50 U.S. states plus the District of Columbia, denotes the data series of the pro- and anti-lockdown ratios of each state, and denotes the social distancing index of each state within a specific time period. The correlation plot is presented in Fig. 4 , accounting for all the related tweet data between April 21 and July 21. Retweets were not removed since people who retweeted a tweet could hold the same opinion of the original tweet. A tweet with more retweets indicates that more people recognized and shared the viewpoint.
Fig. 4.
a. The correlation between the anti-lockdown ratio and social distancing index. b. The correlation between the pro-lockdown ratio and social distancing index.
The correlation coefficient between the anti-lockdown ratio and the social distancing index is −0.55. Since the sum of the pro-lockdown ratio and anti-lockdown ratio equals 1, the correlation coefficient between the pro-lockdown ratio and the social distancing index is 0.55. The pro/anti-lockdown ratio shows a moderate and positive/negative correlation with the social distancing index for the study period. In the following analysis, this study only shows the relation between the anti-lockdown ratio and the social distancing index for the avoidance of repetition.
This study further grouped the anti-lockdown tweets on a weekly basis and computed the correlation coefficient for each week. The results are exhibited in Fig. 5 and Table 5 . The weekly state-level anti-lockdown ratio shows a moderate and negative correlation with the social distancing index except for three weeks (June 3 ~ June 9, June 17 ~ June 23, and July 15 ~ July 21, as shown in Table 5).
Fig. 5.
Correlation between weekly average social distancing and anti-lockdown ratio.
Table 5.
Correlation results on a weekly basis.
| Date | Correlation Coefficient | P-value |
|---|---|---|
| 04/22–04/28 | −0.533 | <0.001* |
| 04/28–05/05 | −0.551 | <0.001* |
| 05/06–05/12 | −0.586 | <0.001* |
| 05/13–05/19 | −0.534 | <0.001* |
| 05/20–05/26 | −0.441 | 0.001* |
| 05/27–06/02 | −0.407 | 0.003* |
| 06/03–06/09 | −0.181 | 0.202 |
| 06/10–06/16 | −0.615 | <0.001* |
| 06/17–06/23 | −0.137 | 0.336 |
| 06/24–06/30 | −0.434 | 0.001* |
| 07/01–07/07 | −0.374 | 0.007* |
| 07/08–07/14 | −0.313 | 0.025* |
| 07/15–07/21 | 0.090 | 0.530 |
(* p-value < 0.05, statistically significant).
A p-value of less than 0.05 is statistically significant, implying that the weekly anti-lockdown ratio is likely to be associated with the changes of the social distancing index. For most weeks in the study period, the anti-lockdown ratio presents a moderate and negative correlation with the social distancing index. When the social distancing index becomes smaller, the expressed anti-lockdown ratio on Twitter from the state tends to be higher. In other words, people from those states with a higher anti-lockdown ratio were more likely to travel outside during and after the lockdown period. This conclusion demonstrates the credibility of using the anti-lockdown ratio as an indicator to assess the risk of human interactions during the COVID-19 lockdown. The states with a higher anti-lockdown ratio tend to have a higher risk of close human interactions when compared to those with a lower anti-lockdown ratio.
According to Table 5, the correlation coefficient appears to be smaller and more fluctuated from June 3 to July 21 (most states lifted the lockdown in early June, Fig. 2b) when compared to the coefficients from April 22 to June 2 (many states were still under stay-at-home orders, Fig. 2b). This observation further reveals that the anti-lockdown ratio is presumably more aligned with the social distancing index under the lockdown orders but less associated when the lockdown orders were lifted.
5. Discussion and conclusion
First responders, government entities, and private sectors need information to assess and manage the risk of viral transmission during the pandemic. Current survey-based analysis and mobile data-based systems have provided benefits in assessing the risk of human interactions. However, collecting survey responses and mobile data can be costly and time-consuming. Social media data have the advantages of rapidity, quantity, and spatial coverage and can provide useful information to support the risk assessment. To demonstrate the credibility, this study defined the pro- and anti-lockdown ratios based on Twitter data and explored their associations with the social distancing index and other related metrics (e.g., reported infections, number of states under lockdown). Below are the highlighted findings of this study.
-
•
The trend of the anti-lockdown ratio gradually increased from late April to early June. An extension of lockdown orders might trigger negative emotions expressed on social media. Moreover, there was a concomitant increase of the pro-lockdown ratio with increasing reported COVID-19 infections (Pearson R = 0.59, p < 0.05), implying that the severity of the pandemic could raise people's risk awareness.
-
•
The increase of the anti-lockdown ratio in early June was associated with Black Lives Matter (BLM) movements. Some people on social media criticized the government for its unequal treatment for BLM protests and anti-lockdown protests. Such voices further cast doubt on the necessity of lockdown orders.
-
•
A negative association between the anti-lockdown ratio and the state's social distancing index (Pearson R = −0.55, p < 0.05) was found. This also indicates that there is a positive association between the pro-lockdown ratio and the social distancing index (Pearson R = 0.55, p < 0.05).
-
•
The study revealed the connection between the opinions expressed on social media and the pattern of people's behaviors in the pandemic. People from the states with a higher anti-lockdown ratio were more likely to travel outside during and after the lockdown period.
In addition, this study provided several insights on using social media data to understand public opinions. This study presented the framework on how to build the text classification pipelines and discussed how to address sample imbalance with textual data. The framework can be generalized to quantify the level of perceptions expressed on social media towards a policy or an event.
For implications, the pro- and anti-lockdown ratios reflect people's understandings and compliances with the rules and regulations during the pandemic, and thus can be used as indicators to evaluate the risk of human interactions in the U.S. states. This social media-based approach is practical and operable since it has the advantages of instantaneity, cost-efficiency, and spatial coverage. The model developed in this study can supplement current mobile location-based monitoring systems and provide government agencies, health officials, and the residents an additional instrument to assess such risk.
Despite the aforementioned benefits, there are some limitations of the study. First, this approach may not be applicable for long-term use if the selected topics are not discussed in social media. For example, if the discussion on lockdown is discontinued and active, few tweets of the subject may sway the results. Second, the anti-lockdown ratio is only used to indicate the risk across U.S. states. In other words, a higher anti-lockdown ratio observed in a specific day does not necessarily imply a higher risk of human interactions in that day since the attitude expressed on social media can be primarily affected by related news and policies. For example, BLM movements contributed to an increase of anti-lockdown ratio in early June. Third, it is noticed that the anti-lockdown ratio computed in this study is higher than the polling results. It possibly results from the limited capability of the training model and the inherently imperfect quality of the social media data. Ongoing and further work will pay attention to the text classification models by preparing more training samples, especially for pro-lockdown tweets, and deploying models with more advanced learning techniques (e.g., recurrent neural network). The other research topic will focus on the improvement of the data cleaning process, such as excluding the bot accounts, to make the results represent the real-world opinions.
Author contributions
L. L. contributed to conceptualization, Data curation, Formal analysis, Methodology, Project administration, Visualization, Writing – original draft, and writing – review & editing, revising manuscript. Z. M. contributed to conceptualization, Data curation, Resources, Writing – original draft, writing – review, revising manuscript. H. L. contributed to writing – review & editing, revising manuscript. S. L. contributed to conceptualization, Investigation, Methodology, Project administration, Resources, Writing – original draft, revising manuscript.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors acknowledge the help of Maryland Transportation Institute at University of Maryland to provide the social distancing index data.
Appendix.
This study extended the spatial analysis to figure out the associations of state-level anti-lockdown ratio with geodemographic factors. The selected geodemographic factors include educational level (bachelor's degree %) [68], health (health value 2018) [69], unemployment change (unemployment change from May 2019 to May 2020) [70], household income (average household income 2018) [71], party affiliation (net democratic) [72], economic (GDP per capita 2019) [73], ethnic group (non-white percentage 2018) [74], age (median age 2018) [75], and gender (male to female ratio 2018) [76]. The correlation results are exhibited in Fig. 6 .
Fig. 6.
Correlation analysis with selected geodemographic factors. a. Correlation with bachelor degree % (R = −0.676, p < 0.001). b. Correlation with health value (R = −0.597, p < 0.001). c. Correlation with unemployment change rate (R = 0.463, p < 0.001). d. Correlation with average household income (R = −0.600, p < 0.001). e. Correlation with net democratic (R = −0.718, p < 0.001). f. Correlation with GDP per capita (R = −0.585, p < 0.001). g. Correlation with non-white % (R = −0.147, p = 0.310). h. Correlation with median age (R = −0.194, p = 0.177). i. Correlation with male to femal ratio (R = 0.166, p = 0.248).
According to Fig. 6, the state-level anti-lockdown ratio displays a moderate to strong negative correlation with health value (R = −0.597, p < 0.001), bachelor degree (R = −0.676, p < 0.001), net democratic (R = −0.718, p < 0.001), GDP per capita (R = −0.585, p < 0.001), and average household income (R = −0.600, p < 0.001). People in the states with higher health value, higher educational level, higher GPD per capita, higher average household income, and more democratic inclined were more likely to hold a perception for supporting lockdown policy. Moreover, people were likely to oppose lockdown policy when the state saw a lower unemployment rate since the state-level ratio shows a moderate correlation with unemployment change (R = 0.463, p < 0.001). Among these investigated factors, the anti-lockdown ratio doesn't show a significant correlation with median age, male to female ratio, and non-white ratio, demonstrated by p-value > 0.050. One interesting observation for Fig. 6 is that the state-level perception shows a moderate correlation with those socioeconomic or political factors (e.g., education, party affiliation, health, income, GDP per capita) within U.S. states, moderate correlation with the factors related to the pandemic (e.g., unemployment change rate), but no clear correlation with demographic attributes (e.g., gender, age, ethnic group).
References
- 1.Coronavirus (COVID-19) Google news. https://news.google.com/covid19/map?hl=en-US&gl=US&ceid=US:en
- 2.Centers for Disease Control and Prevention (CDC) Coronavirus disease 2019 (COVID-19) - how to protect yourself & others. https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/prevention.html
- 3.COVID-19 pandemic lockdowns . Aug. 21, 2020. Wikipedia.https://en.wikipedia.org/w/index.php?title=COVID-19_pandemic_lockdowns&oldid=974241172 [Online]. Available. [Google Scholar]
- 4.Centers for Disease Control and Prevention (CDC) Coronavirus disease 2019 (COVID-19) - frequently asked questions. https://www.cdc.gov/coronavirus/2019-ncov/faq.html
- 5.Jackson C., Newall M. Most Americans hopeful COVID-19 will be under control in six months, yet see federal government as making things worse. https://www.ipsos.com/en-us/news-polls/axios-ipsos-coronavirus-index
- 6.Coughlan M., Cronin P., Ryan F. “Survey research: process and limitations. Int. J. Ther. Rehabil. Jan. 2009;16(1):9–15. doi: 10.12968/ijtr.2009.16.1.37935. [DOI] [Google Scholar]
- 7.Maryland Transportation Institute University of Maryland COVID-19 impact analysis platform. https://data.covid.umd.edu
- 8.Walle T. The Unacast social distancing scoreboard. https://www.unacast.com/post/the-unacast-social-distancing-scoreboard
- 9.Al-Hasan A., Yim D., Khuntia J. “Citizens' adherence to COVID-19 mitigation recommendations by the government: a 3-country comparative evaluation using web-based cross-sectional survey data,” J. Med. Internet Res. Aug. 2020;22(8):e20634. doi: 10.2196/20634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Malecki K.M.C., Keating J.A., Safdar N. Crisis communication and public perception of COVID-19 risk in the era of social media. Clin. Infect. Dis. Jun. 2020:ciaa758. doi: 10.1093/cid/ciaa758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Basch C.H., Hillyer G.C., Meleo-Erwin Z.C., Jaime C., Mohlman J., Basch C.E. “Preventive behaviors conveyed on YouTube to mitigate transmission of COVID-19: cross-sectional study,” JMIR public health surveill., April. 2020;6(2):e18807. doi: 10.2196/18807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.2020 United States Anti-lockdown Protests. Wikipedia; Aug. 18, 2020. https://en.wikipedia.org/w/index.php?title=2020_United_States_anti-lockdown_protests&oldid=973718159 [Online]. Available. [Google Scholar]
- 13.Kim S.C., Hawkins K.H. The psychology of social media communication in influencing prevention intentions during the 2019 U.S. measles outbreak. Comput. Hum. Behav. Oct. 2020;111:106428. doi: 10.1016/j.chb.2020.106428. [DOI] [Google Scholar]
- 14.Yang Y., Sun Y. Public voice via social media: role in cooperative governance during public health emergency. Int. J. Environ. Res. Publ. Health. Sep. 2020;17(18) doi: 10.3390/ijerph17186840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Shan S., Yan Q., Wei Y. Infectious or recovered? Optimizing the infectious disease detection process for epidemic control and prevention based on social media. Int. J. Environ. Res. Publ. Health. Sep. 2020;17(18) doi: 10.3390/ijerph17186853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lim S., Tucker C.S., Kumara S. “An unsupervised machine learning model for discovering latent infectious diseases using social media data. J. Biomed. Inf. Feb. 2017;66:82–94. doi: 10.1016/j.jbi.2016.12.007. [DOI] [PubMed] [Google Scholar]
- 17.Lee K., Agrawal A., Choudhary A. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’13, Chicago, Illinois, USA. 2013. Real-time disease surveillance using Twitter data: demonstration on flu and cancer; p. 1474,. [DOI] [Google Scholar]
- 18.Aramaki E., Maskawa S., Morita M. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Jul. 2011. Twitter catches the flu: detecting influenza epidemics using twitter; pp. 1568–1576.https://www.aclweb.org/anthology/D11-1145 [Online]. Available. [Google Scholar]
- 19.Ginsberg J., Mohebbi M.H., Patel R.S., Brammer L., Smolinski M.S., Brilliant L. “Detecting influenza epidemics using search engine query data,” Nature, vol. 457, no. Feb. 2009;7232:1012–1014. doi: 10.1038/nature07634. [DOI] [PubMed] [Google Scholar]
- 20.Li C., Chen L.J., Chen X., Zhang M., Pang C.P., Chen H. Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data, China, 2020. Euro Surveill. Mar. 2020;25(10) doi: 10.2807/1560-7917.ES.2020.25.10.2000199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Classroom Mental Health https://classroommentalhealth.org/in-class/thoughts/
- 22.Falco A., Piccirelli A., Girardi D., Dal Corso L., De Carlo N.A. Risky riding behavior on two wheels: the role of cognitive, social, and personality variables among young adolescents. J. Saf. Res. Sep. 2013;46:47–57. doi: 10.1016/j.jsr.2013.03.002. [DOI] [PubMed] [Google Scholar]
- 23.Brandmiller C., Dumont H., Becker M. “Teacher perceptions of learning motivation and classroom behavior: the role of student characteristics. Contemp. Educ. Psychol. Oct. 2020;63:101893. doi: 10.1016/j.cedpsych.2020.101893. [DOI] [Google Scholar]
- 24.Fan J., Wei X., Ko I. “How do hotel employees' feeling trusted and its differentiation shape service performance: the role of relational energy. Int. J. Hospit. Manag. Jan. 2021;92:102700. doi: 10.1016/j.ijhm.2020.102700. [DOI] [Google Scholar]
- 25.Kim D., Hyun H., Park J. “The effect of interior color on customers' aesthetic perception, emotion, and behavior in the luxury service. J. Retailing Consum. Serv. Nov. 2020;57:102252. doi: 10.1016/j.jretconser.2020.102252. [DOI] [Google Scholar]
- 26.Ward J.C., Barnes J.W. “Control and affect: the influence of feeling in control of the retail environment on affect, involvement, attitude, and behavior. J. Bus. Res. Nov. 2001;54(2):139–144. doi: 10.1016/S0148-2963(99)00083-1. [DOI] [Google Scholar]
- 27.Bode L. Feeling the pressure: attitudes about volunteering and their effect on civic and political behaviors. J. Adolesc. Jun. 2017;57:23–30. doi: 10.1016/j.adolescence.2017.03.004. [DOI] [PubMed] [Google Scholar]
- 28.Kim J., Kim B.-J., Kim N. Perception-based analytical technique of evacuation behavior under radiological emergency: an illustration of the Kori area. Nucl. Eng. Technol. Aug. 2020 doi: 10.1016/j.net.2020.08.012. [DOI] [Google Scholar]
- 29.Cojuharenco I., Cornelissen G., Karelaia N. Yes, I can: feeling connected to others increases perceived effectiveness and socially responsible behavior. J. Environ. Psychol. Dec. 2016;48:75–86. doi: 10.1016/j.jenvp.2016.09.002. [DOI] [Google Scholar]
- 30.Abel M., Byker T., Carpenter J. “Socially optimal mistakes? Debiasing COVID-19 mortality risk perceptions and prosocial behavior,”. J. Econ. Behav. Organ. Jan. 2021 doi: 10.1016/j.jebo.2021.01.007. [DOI] [Google Scholar]
- 31.Bruine de Bruin W., Bennett D. Relationships between initial COVID-19 risk perceptions and protective health behaviors: a national survey. Am. J. Prev. Med. Aug. 2020;59(2):157–167. doi: 10.1016/j.amepre.2020.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Parady G., Taniguchi A., Takami K. Travel behavior changes during the COVID-19 pandemic in Japan: analyzing the effects of risk perception and social influence on going-out self-restriction. Transp. Res. Interdiscip. Perspect. Sep. 2020;7:100181. doi: 10.1016/j.trip.2020.100181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Graham A., Kremarik F., Kruse W. Attitudes of ageing passengers to air travel since the coronavirus pandemic. J. Air Transport. Manag. Aug. 2020;87:101865. doi: 10.1016/j.jairtraman.2020.101865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Li Y., et al. Constructing and communicating COVID-19 stigma on twitter: a content analysis of tweets during the early stage of the COVID-19 outbreak. Int. J. Environ. Res. Publ. Health. Sep. 2020;17(18) doi: 10.3390/ijerph17186847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Nabity-Grover T., Cheung C.M.K., Thatcher J.B. Inside out and outside in: how the COVID-19 pandemic affects self-disclosure on social media. Int. J. Inf. Manag. Jun. 2020:102188. doi: 10.1016/j.ijinfomgt.2020.102188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhong B., Huang Y., Liu Q. Mental health toll from the coronavirus: social media usage reveals Wuhan residents' depression and secondary trauma in the COVID-19 outbreak. Comput. Hum. Behav. Jan. 2021;114:106524. doi: 10.1016/j.chb.2020.106524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Lin C.-Y., Broström A., Griffiths M.D., Pakpour A.H. Investigating mediated effects of fear of COVID-19 and COVID-19 misunderstanding in the association between problematic social media use, psychological distress, and insomnia. Internet Interv. Sep. 2020;21:100345. doi: 10.1016/j.invent.2020.100345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Johnson N.F., et al. The online competition between pro- and anti-vaccination views. Nature. Jun. 2020;582(7811):230–233. doi: 10.1038/s41586-020-2281-1. [DOI] [PubMed] [Google Scholar]
- 39.Li D., Chaudhary H., Zhang Z. Modeling spatiotemporal pattern of depressive symptoms caused by COVID-19 using social media data mining. Int. J. Environ. Res. Publ. Health. Jul. 2020;17(14):4988. doi: 10.3390/ijerph17144988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Zhao Y., Cheng S., Yu X., Xu H. Chinese public's attention to the COVID-19 epidemic on social media: observational descriptive study. J. Med. Internet Res. May 2020;22(5):e18825. doi: 10.2196/18825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Hou Z., Du F., Jiang H., Zhou X., Lin L. Assessment of public attention, risk perception, emotional and behavioural responses to the COVID-19 outbreak: social media surveillance in China. SSRN Electron. J. 2020 doi: 10.2139/ssrn.3551338. [DOI] [Google Scholar]
- 42.Nelson L.M., et al. U.S. Public concerns about the COVID-19 pandemic from results of a survey given via social media. JAMA Intern. Med. Jul. 2020;180(7):1020. doi: 10.1001/jamainternmed.2020.1369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.É M., Czeisler, et al. Infectious Diseases (except HIV/AIDS), preprint; Apr. 2020. COVID-19: Public Compliance with and Public Support for Stay-At-Home Mitigation Strategies. [DOI] [Google Scholar]
- 44.McCann A. Social distancing survey: 36 million Americans' #1 way to cope is online shopping. https://wallethub.com/blog/social-distancing-survey/73704/
- 45.Ngo M. Rounding out the social distancing scoreboard. https://www.unacast.com/post/rounding-out-the-social-distancing-scoreboard
- 46.Kim S. Best and worst states at social distancing revealed. https://www.newsweek.com/best-worst-states-social-distancing-revealed-1501022
- 47.Lamsal R. IEEE DataPort; Mar. 13, 2020. Corona Virus (COVID-19) Tweets Dataset (en) [DOI] [Google Scholar]
- 48.Bird S., Loper E., Klein E. natural language processing with Python. O’Reilly Media Inc. 2009 [Google Scholar]
- 49.Wei J., Zou K. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. 2019. EDA: easy data augmentation techniques for boosting performance on text classification tasks; pp. 6381–6387. [DOI] [Google Scholar]
- 50.Rajaraman A., Ullman J.D. Cambridge University Press; New York, N.Y. ;; Cambridge: 2012. Mining Of Massive Datasets. [Google Scholar]
- 51.J. Ramos, “Using TF-IDF to Determine Word Relevance in Document Queries,” p. 4.
- 52.Pedregosa F., et al. Scikit-learn: machine learning in Python. Mach. Learn. PYTHON. 2011:2825–2830. [Google Scholar]
- 53.Kam Ho Tin. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. Aug. 1998;20(8):832–844. doi: 10.1109/34.709601. [DOI] [Google Scholar]
- 54.Rish I. 2003. An Empirical Study of the Naive Bayes Classifier; p. 6. [Google Scholar]
- 55.Manning C.D., Raghavan P., Schütze H. Cambridge University Press; New York: 2008. Introduction to Information Retrieval. [Google Scholar]
- 56.Cortes C., Vapnik V. Support-vector networks. Mach. Learn. Sep. 1995;20(3):273–297. doi: 10.1007/BF00994018. [DOI] [Google Scholar]
- 57.Kleinbaum D.G., Klein M. Springer New York; New York, NY: 2010. Logistic Regression. [Google Scholar]
- 58.Kingma D.P., Ba J. Adam: a method for stochastic optimization. ArXiv14126980 Cs. Jan. 2017 http://arxiv.org/abs/1412.6980 [Online]. Available. [Google Scholar]
- 59.Lever J., Krzywinski M., Altman N. Classification evaluation. Nat. Methods. Aug. 2016;13(8):603–604. doi: 10.1038/nmeth.3945. [DOI] [Google Scholar]
- 60.Bail C.A., et al. “Exposure to opposing views on social media can increase political polarization. Proc. Natl. Acad. Sci. Unit. States Am. Sep. 2018;115(37):9216–9221. doi: 10.1073/pnas.1804840115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Monmouth University Polling Institute President inconsistent on covid. May 05, 2020. https://www.monmouth.edu/polling-institute/reports/monmouthpoll_us_050520/
- 62.Beaumont T., Fingerhut H. AP-NORC poll: few Americans support easing virus protections. AP NEWS. Apr. 22, 2020 https://apnews.com/article/9ed271ca13012d3b77a2b631c1979ce1 [Google Scholar]
- 63.Mann B. NPR.org; Aug. 04, 2020. Despite Mask Wars, Americans Support Aggressive Measures to Stop COVID-19, Poll Finds.https://www.npr.org/2020/08/04/898522180/despite-mask-wars-americans-support-aggressive-measures-to-stop-covid-19-poll-fi [Google Scholar]
- 64.Coronavirus closures: map of where U.S. states are tightening restrictions. https://www.usatoday.com/storytelling/coronavirus-reopening-america-map/
- 65.Data Download The COVID tracking Project. https://covidtracking.com/data/download
- 66.Nearly Half of the Twitter Accounts Discussing ‘Reopening America’ May Be Bots Carnegie mellon school of computer science. May 20, 2020. https://www.cs.cmu.edu/news/nearly-half-twitter-accounts-discussing-reopening-america-may-be-bots
- 67.News A.B.C. ABC News; May 08, 2020. Reopening the Country Seen as Greater Risk Among Most Americans: POLL.https://abcnews.go.com/Politics/reopening-country-greater-risk-americans-poll/story?id=70555060 [Google Scholar]
- 68.List of U.S. States and Territories by Educational Attainment. Wikipedia; Jun. 29, 2020. https://en.wikipedia.org/w/index.php?title=List_of_U.S._states_and_territories_by_educational_attainment&oldid=965165403 [Online]. Available. [Google Scholar]
- 69.Findings state rankings | 2018 annual report. https://www.americashealthrankings.org/learn/reports/2018-annual-report/findings-state-rankings America’s Health Rankings.
- 70.State Employment and Unemployment Summary https://www.bls.gov/news.release/laus.nr0.htm
- 71.List of U.S. States and Territories by Income. Wikipedia; Mar. 29, 2020. https://en.wikipedia.org/w/index.php?title=List_of_U.S._states_and_territories_by_income&oldid=947939623 [Online]. Available. [Google Scholar]
- 72.Inc G. Democratic states exceed republican states by four in 2018. Gallup.com. Feb. 22, 2019 https://news.gallup.com/poll/247025/democratic-states-exceed-republican-states-four-2018.aspx [Google Scholar]
- 73.List of states and territories of the United States by Gdp - Wikipedia https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_GDP
- 74.Population Distribution by Race/Ethnicity | Kff https://www.kff.org/other/state-indicator/distribution-by-raceethnicity/?currentTimeframe=0&sortModel=%7B%22colId%22%3A%22Location%22%2C%22sort%22%3A%22asc%22%7D
- 75.List of U.S. States and Territories by Median Age. Wikipedia; Jun. 29, 2020. https://en.wikipedia.org/w/index.php?title=List_of_U.S._states_and_territories_by_median_age&oldid=965073937 [Online]. Available. [Google Scholar]
- 76.U.S. population: male to female ratio, by state 2018 | Statista. https://www.statista.com/statistics/301946/us-population-males-per-100-females-by-state/






