Abstract
We investigated the utility of Twitter for conducting multi-faceted geolocation-centric pandemic surveillance, using India as an example. We collected over 4 million COVID19-related tweets related to the Indian outbreak between January and July 2021. We geolocated the tweets, applied natural language processing to characterize the tweets (eg., identifying symptoms and emotions), and compared tweet volumes with the numbers of confirmed COVID-19 cases. Tweet numbers closely mirrored the outbreak, with the 7-day average strongly correlated with confirmed COVID-19 cases nationally (Spearman r=0.944; p=0.001), and also at the state level (Spearman r=0.84, p=0.0003). Fatigue, Dyspnea and Cough were the top symptoms detected, while there was a significant increase in the proportion of tweets expressing negative emotions (eg., fear and sadness). The surge in COVID-19 tweets was followed by increased number of posts expressing concern about black fungus and oxygen supply. Our study illustrates the potential of social media for multi-faceted pandemic surveillance.
Introduction
The global outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), commonly referred to as COVID-19, has been one amongst the worst pandemics known the World history1, resulting in adverse social, political and economic consequences around the globe2. Due to the unprecedented nature of this pandemic, traditional public health surveillance methods struggled to tackle this scenario, and different strategies were adopted by different governments around the world for conducting surveillance3–5. As of August 2021, COVID-19 outbreaks continues globally, with the highly-infectious B.1.617.2 (delta) variant being the driving force behind the waves of outbreak at the time6–8. This variant, which was first detected in India, ravaged the country resulting in the largest national lockdown across the world9. As of 20th July, 2021, India reported over 31 million confirmed cases and over 400,000 deaths. The rapid prevalence the outbreak in India, and other countries, exposed the weaknesses of traditional surveillance systems, exhibiting that most traditional mechanisms were not designed to meet the challenges of this pandemic10. For example, national surveillance methods based on testing symptomatic people, which worked effectively for certain countries, are unrealistic for others10,11. The latest research suggests that the vaccines developed for COVID-19 are effective for the delta variant6, but considering the unvaccinated population across the world, it is likely that outbreaks caused by this variant will continue around the world. Even within largely vaccinated communities, many breakthrough infections have been reported due to this variant12. There is also a possibility of new variants of the virus emerging over time, causing future outbreaks, and perhaps at faster rates. Consequently, there is a need to identify and deploy novel surveillance methods that can complement traditional surveillance approaches13.
One potential resource for conducting effective, real-time surveillance of COVID-19 is social media. Currently, social media adoption and usage around the globe is at an all-time high14, and social media has a global reach, with hundreds of millions of monthly active users. Despite its potential, social media has been largely underutilized for conducting close to real-time surveillance. While social media has potential negative aspects, which have been highlighted in recent literature15, the opportunities it present have not been explored sufficiently. Due to the huge global user base of social media, and the recent advances in data-driven information management approaches, such as natural language processing (NLP) and machine learning, this resource may enable the real-time monitoring of localized outbreaks. In addition to detecting outbreaks, the knowledge generated from social media may be used for conducting syndromic surveillance and understanding population-level perceptions about the pandemic. The knowledge generated over social media have been utilized in recent research for a variety of tasks such as sentiment analysis16, pharmacovigilance17, toxicovigilance18, studying mental health-related topics19, and other health related tasks20.
In this study, we explored the potential utilities of social media for close to real-time pandemic surveillance. We use the recent outbreak caused by the delta variant in India to explore the potential utilities of social media. We specifically utilize the data generated on Twitter, and apply NLP methods to explore utilities beyond simple outbreak detection. Our retrospective analyses of Twitter data illustrate that it may potentially be used for gaining near realtime insights about various aspects of the pandemic. In this paper, we discuss the following possibilities:
Utilizing the volume of COVID-19 data generated on Twitter, combined with geolocation-related metadata, for detecting outbreaks.
Conducting syndromic surveillance through the active monitoring of user-reported symptoms on Twitter via the use of NLP.
Assessing the mental/psychological impacts of outbreaks via automated emotion analysis of COVID-19-related Twitter chatter.
Detecting emerging concerns related to the pandemic in targeted populations via automated Twitter chatter analysis.
We present our methods and findings in the following subsections. We have also publicly reported our findings here publicly available via an interactive, web-based dashboard.†
Materials and methods
Study setting
Our study is based on data generated publicly on the social network Twitter. Twitter is one of the most popular social networks in the world, with close to 400 million monthly active users in 202121. Twitter is a particularly attractive resource for real-time data analysis because most of the data on this platform is public. Posts on Twitter, referred to as ‘tweets’, are essentially ‘microblogs’ and are typically publicly available. Posts on Twitter may also have associated meta-data, which can be leveraged to obtain additional information, such as geolocation-specific statistics.
Data collection
We collected data from Twitter via its streaming application programming interface (API) that was specifically created for conducting COVID-19 research22. The COVID-19 stream API was released by the company to enable researchers study the conversation surrounding COVID-19, and the authors of this manuscript were granted special permission to access the stream. Unlike the traditional public streaming API, which only provides access to a sample of tweets posted at any time, the COVID-19 API stream delivers all the conversations about the topic without any rate limitations. We collected all posts in English from this API using COVID-19-related keywords (eg., ‘COVID’, ‘COVID19’, ‘corona virus’),‡ and utilized the metadata associated with the tweets to geolocate the sources, specifically tweets that originated from India. When available, we used the geolocation coordinates of tweets to identify specific regions of India from which each tweet originated. We also collected data about India during this period by adding a second layer of filter containing term ‘India’ to understand global response. For the experiments described in this paper, we used data collected in this manner from the beginning of 2021 until July 2021. The streaming big data was stored in a mongoDB database23, and NLP methods were applied to derive insights from the data.
Data analyses
We conducted analysis to explore several aspects of the Indian outbreak, specifically (i) outbreak timeline and location analysis, (ii) real-time syndromic surveillance, (iii) population-level emotion analysis during the outbreak, and (iv) emerging topic detection. We described the methods applied for the specific analyses in the following subsections.
Outbreak timeline and location
We used the volume of COVID-19 related tweets over time geolocated from India to track the outbreak timeline. We compared the timeline with other events in India (eg., opening of public places such as shopping complexes and movie theatres). Whenever possible, we mapped the tweets to the states from which they were posted. We used 2 methods to detect the geolocation origins of the tweets. First, for tweets that had geolocation coordinates available, we mapped them to the specific state in India. For tweets that did not have geolocation coordinates in the meta-data, we used an existing package called geo-carmen24. This package uses the meta-data provided by the Twitter API to extract their location from their geo-tagging information, user profile etc. The geolocation information is further possibly segmented into country, state and county level. This helps in identifying the precise locations and to identify patterns specific to certain areas.
We additionally used a previously-developed machine learning classifier to detect tweets that represented self-reports of COVID-19 positive tests (ie., users who reported that they had tested positive for COVID-19)25. The classifier was trained to use posts from Twitter which mentioned COVID-19-related keywords. These posts were manually annotated to indicate self-report or otherwise. To prepare the texts of the tweets for this classifier, we had to perform some basic preprocessing on the text. We first tokenized the texts by breaking lines into words, called tokens, followed by converting all the data into lower case. We also removed stopwords, punctuations, numbers, extra spaces, and special characters using “cleantext” package. The detection of self-reports was modeled as a binary classification task, and we applied a state-of-the-art method called bidirectional encoder representations from transformers (BERT)26 on manually annotated data.
In addition to tracking posts originating from India, we also tracked the global response to the Indian COVID-19 outbreak by identifying tweets emerging from outside India that contained both: COVID-19 related keywords and references to India. We geolocated tweets at the country-level to identify which countries, other than India, had high interest in the outbreak.
Syndromic surveillance
We applied a previously-developed COVID-19 Twitter symptom lexicon27 to detect specific symptoms that were reported by users from India. The lexicon contains non-standard expressions and misspellings that are commonly found in social media data. To detect symptoms from the text, we applied NLP to perform inexact matching. This enabled us to detect symptom expressions that were lexically similar to those encoded in the lexicon, but not necessarily identical. The lexicon was used to map symptom expressions to standardized IDs in the Unified Medical Language System (UMLS)28. We then computed the frequency of each symptom.
Emotion analysis
Our intent was to analyze the emotions expressed via tweets before, during and after the outbreak, that could also potentially detect changes in emotions over time. We performed linguistic emotion analysis of the tweets using the lexicon curated by the National Research Council, Canada, which contains a comprehensive list of approximately 14,182 English words related to anger, fear, anticipation, trust, surprise, sadness, joy, sentiment (negative and positive), and disgust29. In addition to this, we also quantified the aggregated anxiety levels over time, as expressed by the tweets. We tried to identify the changes in user’s emotions pre-outbreak and post-outbreak. We considered pre-outbreak period as January and February while March, April, and May as post-outbreak period. The intensity of each emotion is measured between 0 to 1 at the tweet level, where 0 is the least and 1 to be the highest.
Detecting emerging concerns
We used frequency distributions to identify emerging concerns and interests associated with the outbreak. For detectable emerging concerns, we tracked their distribution over time by tracking frequencies of word bigrams and trigrams (Figure. 1). Bi-grams and tri-grams are collectively called n-grams, which represent contiguous sequences of n words. This led us to discover multiple topics—black fungus, a disease that widespread incidence in India following the COVID-19 outbreak, and vaccine-related chatter, specifically represented by the keywords ‘CoviShield’ and ‘CoVaxin’, which represent the two vaccines that were available in India at the time.
Figure 1.
Most frequently occurring bi-grams in our COVID-19-related Twitter data.
Results
Outbreak and location
Between January and July 2021, we collected over 4 million tweets about the outbreak in India, of which over 500,000 were geolocated to be from India, with 9,700 having specific geolocation coordinates. Globally, 3.56 million tweets were posted on India from other countries. Figure 2 presents the timeline of COVID-19 tweets geolocated which have been posted from India between early January 2021 to early June, 2021. The figure also shows the timeline of daily confirmed COVID-19 cases in India, and the timelines for two important national events—the opening of public places such as shopping centers and state-level elections. From the figure, it can be seen that the daily volume of tweets closely followed the number of confirmed COVID-19 cases. We also found a statistically significant correlation (Spearman r=0.944, P=0.001) between the 7-day moving average of tweet count and 7-day moving average of COVID cases reported.
Figure 2.
Comparison of weekly volume of COVID-19 related tweets from India (top) and the number of confirmed COVID-19 cases per day (bottom).
Figure 3 shows the state-level distribution of tweets during this timeframe. Darker shades represent higher numbers of tweets. Significant correlations for the COVID-19 cases in the states were found with the volume of Twitter data available (Spearman r = 0.84, p = 0.0003). The highest number of tweets were from the states Maharashtra (~24%), Karnataka (~11.5%), Uttar Pradesh (~7.5%) and Tamil Nadu (~7%). The correlation between tweet count and COVID-19 cases recorded for the top 4 states (Maharashtra, Karnataka, Uttar Pradesh, and Tamil Nadu) is also strong, however not statistically significant (Spearman r = 0.8, p= 0.200) due to the low number of available data points. Maharashtra, which also has one amongst the largest cities in India (Mumbai) also had the highest number of COVID-19 cases during this time, than the other states. The three other states with the highest number of tweets were among the next top 5 states in terms of highest numbers of COVID-19 cases. Kerala and Andhra Pradesh were the two other states that had high numbers of confirmed COVID-19 cases but relatively lower number of tweets. The number of self-reports detected via supervised classification was relatively low, peaking at 109 reports on April 18th. In total, 374 COVID-19 self-reports recorded in the first half of 2021. This finding is different from the other conducted by similar recent studies similar recent studies focusing on other geolocations, which showed high numbers of self-reports during early COVID-19 outbreaks27.
Figure 3.
State-level distribution of tweets about COVID-19 during our collection period (January to July 2021). Darker shades represent higher numbers of posts.
Syndromic surveillance
The most commonly discussed/reported symptom was fatigue, followed by cough and dyspnea (shortness of breath). The number of mentions about of fatigue were more than double the next highest reported symptom (cough). Other detected symptoms included headache, anosmia (loss of smell) and loss of appetite. All these symptoms were among the top 8 symptoms of acute COVID-19 that were detected to be reported by COVID-19 positive Twitter users. Relatively speaking, two symptoms that were underreported were pyrexia (fever) and body ache & pain.
Emotion analysis
Compared to the pre-outbreak time period, there was a detectable surge in negative emotions during the outbreak, namely fear, sadness, and anger (Figure 4). As much as six times the pre-outbreak count of tweets prone to fear, anger and sadness were posted during the months of April and May, which coincided with the outbreak period.
Figure 4.
Changes in negative emotions (anger, fear, sadness) detected from text during pre-and post-outbreak periods in India.
Emerging concerns
In addition to increased volume of COVID-19 tweets during the outbreak, there is a high levels of negative emotions, and high levels of anxiety expressed in the tweets, our NLP-driven analyses specifically discovered three topics that has emerged during the Indian outbreak—black fungus, oxygen supply, and COVID-19 vaccines. Many cases of mucormycosis, commonly referred to as black fungus, were detected in Asian countries, particularly India, following the outbreak of the delta variant of the COVID-19 virus30. In general, the main source of mucormycosis infections is from multiple contaminated sources and tend to infect diabetic patients quickly. As the number of COVID-19 cases increased in India, the number of black fungus infected particularly among diabetic patients, also increased31. Chatter about black fungus originating from India started rising from early May which peaked on the 20th of the month, approximately two months after the outbreak-related chatter surged on Twitter (Figure 5a). The timeline of the rise in black fungus chatter coincided with the extraordinarily high numbers of post-COVID infections of cerebral mucormycosis infections in the country32.
Figure 5.
(a) Volume of tweets from India mentioning black fungus over time; (b) Volume of tweets mentioning ‘CoVaxin’ or ‘CoviShield’ over time. The figure also shows the timeframe when the supply of CoVaxin was reduced.
Vaccine-related chatter also surged during this wave of outbreak in India, although, unlike black fungus, the increase in such chatter was steady from early 2021 and continued to increase during the outbreak months. Two prominent vaccines have been widely used in India—CoVaxin and CoviShield. Among the vaccine-related chatter, nearly 44% of the users discussed about CoVaxin while 56% about CoviShield. Early on in the year, CoVaxin was the most commonly discussed vaccine, but it was later surpassed by CoviShield (Figure 5b). Interestingly, the increase in CoviShield-related chatter relative to CoVaxin coincided with a reduction in supply of the latter, around late April. Another topic of interested identified through twitter chatter was oxygen supply. As the COVID-19 infections rapidly increased, India a faced crisis in oxygen supply. The requirement of oxygen was observed in the Twitter chatter as the topmost mentions about oxygen included oxygen supply, oxygen bed, oxygen concentrators, and oxygen cylinders, as shown in Figure 6.
Figure 6.
Frequently used terms with oxygen and their number of mentions in the twitter chatter.
Early detection
Our findings also suggest that Twitter may be used for predicting the outbreak of COVID-19, by detecting symptom-mentioning posts. We found a strong and significant correlation between the number of covid cases recorded and the counts of symptomatic tweets mentioned a week earlier in the chatter (Spearman r=0.897, p=0.000). Early detection/prediction may have a significant impact by enabling us to estimate future hospitalization needs.
Discussion
Social media, specifically Twitter, chatter encapsulates information in abundance regarding COVID-19. The knowledge contained within this resource can potentially be leveraged to obtain real-time insights about the current pandemic and also future pandemics. Our explorations on large-scale Twitter data generated specifically during the delta variant outbreak in India suggests that such data can be utilized to obtain multifaceted insights, adding to the detection of geolocation-specific outbreaks. While we used the Indian outbreak as our chosen topic, the methods outlined in this paper may be applied to any specific region of the world and over any social network. For our study, Twitter was a suitable social network as India has the third highest number of Twitter users in the world, following the United States and Japan33. For our study, the COVID-19 streaming API made it possible to collect user-posted data in real time. Most social networks provide APIs that can be leveraged to obtain real-time insights, thus, for monitoring pandemics in regions with lower numbers of Twitter users, the most relevant social networks can be used.
One of the primary challenges of using social media data for obtaining such multi-faceted insights is its difficulty of mining knowledge from the noisy text-based data that is generated34–36. The text-based knowledge that is generated is typically hidden in large volumes of noise, non-standard terminologies, misspellings, and ambiguous expressions. While it is possible to extract relevant knowledge, by manual inspection of each data point, the large volume of data makes manual curation on a continuous basis impossible, particularly in real-time. Thus, such manual curation and analyses of data are generally retrospective in nature37,38. Importantly, manual curation is typically limited to small data samples, and relying on such curation methods takes away one of the major advantages of social media— the availability of big data. Thus, leveraging social media data optimally requires the development of advanced data science, NLP and machine learning methods, which can effectively characterize streaming data. In addition to addressing the above-mentioned challenges associated with social media data, such approaches also need to address emerging problems on social media, as in misinformation39,40, often referred to as an infodemic15.
One among the key findings of this work is the high correlation between the COVID-19 case numbers and the volume of Twitter chatter. States in India with higher numbers of COVID-19 cases also tended to have high volumes of chatter. We observed that states with large metropolitan cities such as Mumbai (Maharashtra), Bengaluru (Karnataka), and Chennai (Tamil Nadu) tend to produce higher volumes of chatter associated with the outbreak. There are two explainable reasons behind this: (i) large metropolitan cities are generally the epicenters of outbreaks, which was no different in case of the Indian outbreak; and (ii) such metropolitan cities also have large numbers of technologically adept, young people, who make up to the vast majorities of social media users. In addition to mirroring the outbreak, the Twitter chatter also revealed symptoms reported/discussed by the users, which may be used to conduct syndromic surveillance. In areas with low number of testing locations, social media based syndromic surveillance may provide early signals about upcoming outbreaks. Interestingly, although there was a large volume of COVID-19 related chatter emerging from India, unlike prior studies conducted on populations from other countries (eg., United States), we found low number of self-reports on COVID-19 positive status. While exploring the reasons behind this is outside the scope of our work, we suspect it might be because of stigma associated with COVID-1941. It is possible that the tweet volumes were in response to the rising case numbers, and not necessarily predictive.
In addition to outbreak and syndromic surveillance, our study shows that Twitter data can be effectively utilized to assess population-level emotions associated with the pandemic. The rise in negative emotions may not only be a response to COVID-19, but also to the social distancing and ‘lockdown’ measures implemented by regional governments. A number of recent studies have elaborated on the negative mental health consequences of the pandemic42,43, and our study suggests that social media data at the time of an outbreak maybe utilized for better understanding on geolocation-specific mental health impacts (eg., by studying/detecting expressed emotions). Interventions or social programs may be guided by population-level mental health assessments at specific times and places. While we attempted to assess emotions in a relatively simplistic manner, more sophisticated approaches under the umbrella of sentiment analysis44 which might be employed for more targeted information. For example, sentiment analysis methods have been employed in the past to study people’s perceptions about vaccines45 and treatments46. A recent systematic review discussed many applications of sentiment analysis approaches during pandemics, such as the current one, and infectious disease outbreaks47.
Strengths and limitations
The methods we described in this paper have several advantages compared to more other traditional surveillance approaches. Firstly, the social media based surveillance methods discussed are purely data-centric, and they do not incorporate hypothesis-driven biases. For emerging health crises with many unknown issues, like the current COVID-19 pandemic, data-driven approaches can provide insights on topics that may have been overlooked under other circumstances. Secondly, streaming social media based monitoring can be done at real-time or close to it. This can significantly reduce the lag times associated with many traditional surveillance approaches that rely on tools such as surveys or require compiling data from multiple sources (eg., reports from testing centers). Rapid outbreaks, such as those observed during the COVID-19 delta variant, require rapid responses, and data-centric approaches over social media data may aid such processes. And finally, the large user base of social networks means that they can potentially provide access to hard-to-reach populations. While our study is exploratory by nature, the methods we applied may be built on to establishing participatory surveillance through social media10,48. In our work, information flow was unidirectional, but there is a potential for establishing bidirectional social media based communication channels to complement the existing public health education, surveillance and intervention methods. Such bidirectional communication models have been explored in recent literature for targeted topics, such as the use of chatbots for mental health support49, but their utility for pandemic surveillance and response has not been focused.
Our study has several limitations as well. The primary limitation stems from the user base of Twitter—users tend to be younger than the general population and is thus not representative of the general population. However, over recent years, social media adoption has increased significantly among older demographics50. Also, our study only focused on tweets that were in English. This limitation was introduced because the NLP tools we employed were not designed for multilingual text processing. Recent research in the field of computational linguistics has focused on developing multilingual corpora to aid multilingual NLP, and similar efforts have been reported for clinical texts51. Purely data-centric methods such as ours are also vulnerable to potential data manipulation by bots or automated accounts, and to misinformation. Both of these problems persist in all social media based studies, and currently researchers and social network administrators are actively engaged in reducing the impacts of these. Twitter, for example, has recently adopted a zero tolerance stance for accounts spreading misinformation, and actively suspends or closes such accounts.
Conclusion
Social media, specifically Twitter, chatter encapsulates an abundance of information regarding COVID-19. The knowledge contained within this resource can potentially be leveraged to obtain real-time insights about the current pandemic and may help in forecasting future pandemics. While our study focuses solely on India, the same methods can be applied to conduct real-time surveillance in other countries, including the United States. It must be noted that while we have outlined the utilities of social media for pandemic surveillance, we do not advocate the replacement of traditional methods of surveillance by social media based ones. Traditional surveillance methods, such as those relying on testing center numbers, hospital admissions, and contact tracing methods, to name a few, have been established over the years through evidence-based research. Our findings suggest that social media has high potential for complementing traditional surveillance methods, and as the user base of social media grows, the utility of such platforms may further increase in the future. Future research efforts should investigate further how social media can complement traditional surveillance methods, and also how they may be utilized for participatory surveillance and interventions.
Footnotes
Available at: https://bit.ly/3yDhzvM. Accessed (7th August, 2021).
The full set of supported terms are available at: https://developer.twitter.com/en/docs/labs/covid19-stream/filtering-rules. Accessed (7th August, 2021).
References
- 1.Sansa NA. Effects of the COVID-19 Pandemic on the World Population: Lessons to Adopt from Past Years Global Pandemics. SSRN Electron J. Published online April 1, 2020. doi:10.2139/ssrn.3565645.
- 2.King EJ, Dudina VI. COVID-19 in Russia: Should we expect a novel response to the novel coronavirus? https://doi.org/101080/1744169220211900317. Published online 2021. doi:10.1080/17441692.2021.1900317. [DOI] [PubMed]
- 3.Larsen DA, Wigginton KR. Tracking COVID-19 with wastewater. Nat Biotechnol 2020 3810. 2020;38(10):1151–1153. doi: 10.1038/s41587-020-0690-1. . doi:10.1038/s41587-020-0690-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ram N, Gray D. Mass surveillance in the age of COVID-19. J Law Biosci. 2020;7(1):1–17. doi: 10.1093/jlb/lsaa023. . doi:10.1093/JLB/LSAA023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Alwan NA. Surveillance is underestimating the burden of the COVID-19 pandemic. Lancet. 2020;396(10252):e24. doi: 10.1016/S0140-6736(20)31823-7. . doi:10.1016/S0140-6736(20)31823-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bernal JL, Andrews N, Gower C, et al. Effectiveness of Covid-19 Vaccines against the B.1.617.2 (Delta) Variant. https://doi.org/101056/NEJMoa2108891. Published online July 21, 2021. doi:10.1056/NEJMOA2108891.
- 7.Torjesen I. Covid-19: Delta variant is now UK’s most dominant strain and spreading through schools. BMJ. 2021;373 doi: 10.1136/bmj.n1445. :n1445. doi:10.1136/BMJ.N1445. [DOI] [PubMed] [Google Scholar]
- 8.O’Dowd A. Covid-19: Cases of delta variant rise by 79%, but rate of growth slows. BMJ. 2021;373 doi: 10.1136/bmj.n1596. :n1596. doi:10.1136/BMJ.N1596. [DOI] [PubMed] [Google Scholar]
- 9.Lancet T. India under COVID-19 lockdown. Lancet. 2020;395(10233):1315. doi: 10.1016/S0140-6736(20)30938-7. . doi:10.1016/S0140-6736(20)30938-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Suneela Bhatnagar N, Gangadharan N. A Case for Participatory Disease Surveillance of the COVID-19 Pandemic in India. JMIR Public Heal Surveill 2020;6(2)e18795 https//publichealth.jmir.org/2020/2/e18795. 2020;6(2):e18795. doi: 10.2196/18795. . doi:10.2196/18795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wilson N, Schwehm M, Verrall AJ, Parry M, Baker MJ, Eichner M. Detecting the re-emergent COVID-19 pandemic after elimination: modelling study of combined primary care and hospital surveillance | OPEN ACCESS. N Z Med J. 2020;133(1524):28–39. . Accessed August 7, 2021. https://www.nzma.org.nz/journal-articles/detecting-the-re-emergent-covid-19-pandemic-after-elimination-modelling-study-of-combined-primary-care-and-hospital-surveillance. [PubMed] [Google Scholar]
- 12.Farinholt T, Doddapaneni H, Qin X, et al. Transmission event of SARS-CoV-2 Delta variant reveals multiple vaccine breakthrough infections. medRxiv. Published online July 12, 2021. doi:10.1101/2021.06.28.21258780. [DOI] [PMC free article] [PubMed]
- 13.Mavragani A. Tracking COVID-19 in Europe: Infodemiology Approach. JMIR Public Heal Surveill 2020;6(2)e18941 https//publichealth.jmir.org/2020/2/e18941. 2020;6(2):e18941. doi: 10.2196/18941. . doi:10.2196/18941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chaffey D. Global social media statistics research summary [updated 2021]. Our compilation of the latest social media statistics of consumer adoption and usage of social networking platforms. Published 2021. Accessed August 8, 2021. https://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research/
- 15.Cinelli M, Quattrociocchi W, Galeazzi A, et al. The COVID-19 social media infodemic. Sci Rep. 2020;10(1):16598. doi: 10.1038/s41598-020-73510-5. . doi:10.1038/s41598-020-73510-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ghiassi M, Lee S. A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach. Expert Syst Appl. 2018;106 :197-216. doi:10.1016/j.eswa.2018.04.006. [Google Scholar]
- 17.Sarker A, Ginn R, Nikfarjam A, et al. Utilizing social media data for pharmacovigilance: A review. J Biomed Inform. 2015;54. doi:10.1016/j.jbi.2015.02.004. [DOI] [PMC free article] [PubMed]
- 18.Chan B, Lopez A, Sarkar U. The Canary in the Coal Mine Tweets: Social Media Reveals Public Perceptions of Non-Medical Use of Opioids. Hildt E, ed. PLoS One. 2015;10(8):e0135072. doi: 10.1371/journal.pone.0135072. . doi:10.1371/journal.pone.0135072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yazdavar AH, Al-Olimat HS, Ebrahimi M, et al. Semi-Supervised approach to monitoring clinical depressive symptoms in social media. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2017. Association for Computing Machinery, Inc; 2017:1191-1198. doi:10.1145/3110025.3123028. [DOI] [PMC free article] [PubMed]
- 20.Sinnenberg L, Buttenheim AM, Padrez K, Mancheno C, Ungar L, Merchant RM. Twitter as a Tool for Health Research: A Systematic Review. Am J Public Health. 2017;107(1):143–143. doi: 10.2105/AJPH.2016.303512. . doi:10.2105/AJPH.2016.303512a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Most used social media 2021 | Statista Published 2021. Accessed August 7, 2021. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/
- 22.Twitter Twitter Developer Platform | COVID-19 stream. COVID-19 Stream. Published 2020. Accessed August 8, 2021. https://developer.twitter.com/en/docs/labs/covid19-stream/overview.
- 23.MongoDB Atlas: Cloud Document Database | MongoDB Published 2021. Accessed August 7, 2021. https://www.mongodb.com/cloud/atlas/lp/try2?utm_source=bing&utm_campaign=mdb_bs_americas_united _states_search_core_brand_atlas_desktop&utm_term=mongodb&utm_medium=cpc_paid_search&utm_ad= e&utm_ad_campaign_id=415204521&msclkid=07de78ec275c16bea6a219e31b3f1a.
- 24.Dredze M, Paul MJ, Bergsma S, Tran H. Carmen: A Twitter Geolocation System with Applications to Public Health. In: Expanding the Boundaries of Health Informatics Using Artificial Intelligence: Papers from the AAAI 2013 Workshop. Association for the Advancement of Artificial Intelligence (AAAI); 2013:20-24.
- 25.Al-Garadi MA, Yang Y-C, Lakamana S, Sarker A. A Text Classification Approach for the Automatic Detection of Twitter Posts Containing Self-reported COVID-19 Symptoms. In: Openreview.Net. ; 2020:1-5.
- 26.Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Published online October 10, 2018. Accessed January 11, 2020. http://arxiv.org/abs/1810.04805.
- 27.Sarker A, Lakamana S, Hogg-Bremer W, Xie A, Al-Garadi MA, Yang Y-C. Self-reported COVID-19 symptoms on Twitter: an analysis and a research resource. J Am Med Informatics Assoc. 2020;27(8):1310. doi: 10.1093/jamia/ocaa116. . 1315. doi:10.1093/jamia/ocaa116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.McCray AT, Aronson AR, Browne AC, Rindflesch TC, Razi A, Srinivasan S. UMLS® knowledge for biomedical language processing. Bull Med Libr Assoc. 1993;81(2):184–194. . Accessed October 3, 2020. https://pubmed.ncbi.nlm.nih.gov/8472004/ [PMC free article] [PubMed] [Google Scholar]
- 29.Mohammad SM, Turney PD. Crowdsourcing a Word-Emotion Association Lexicon. Comput Intell. 2013;29(3):436–465. . doi:10.1111/J.1467-8640.2012.00460.X. [Google Scholar]
- 30.Drissi C. Black fungus, the darker side of COVID-19. J Neuroradiol. Published online July 10, 2021. doi:10.1016/J.NEURAD.2021.07.003. [DOI] [PMC free article] [PubMed]
- 31.Slavin M, Thursky K. BBC | Mucormycosis: the black fungus hitting Covid-19 patients -BBC Future. India is struggling against a rapid increase in Covid-19 cases, but a nasty and rare fungal infection affecting some coronavirus patients is dealing the country a double blow. Published May 19, 2021. Accessed August 16, 2021. https://www.bbc.com/future/article/20210519-mucormycosis-the-black-fungus-hitting-indias-covid-patients.
- 32.Gandra S, Ram S, Levitz SM. 2021. The “Black Fungus” in India: The Emerging Syndemic of COVID-19– Associated Mucormycosis. https://doi.org/107326/M21-2354. Published online June 8, [DOI] [PMC free article] [PubMed]
- 33.Twitter: most users by country | Statista Published 2021. Accessed August 8, 2021. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
- 34.Sarker A, Belousov M, Friedrichs J, et al. Data and systems for medication-related text classification and concept normalization from Twitter: Insights from the Social Media Mining for Health (SMM4H)-2017 shared task. J Am Med Informatics Assoc. 2018;25(10) doi: 10.1093/jamia/ocy114. . doi:10.1093/jamia/ocy114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Sloane R, Osanlou O, Lewis D, Bollegala D, Maskell S, Pirmohamed M. Social media and pharmacovigilance: A review of the opportunities and challenges. Br J Clin Pharmacol. 2015;80(4):910. doi: 10.1111/bcp.12717. . Accessed August 8, 2021. /pmc/articles/PMC4594734/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Staccini P, Fernandez-Luque L, Informatics F. Secondary Use of Recorded or Self-expressed Personal Data: Consumer Health Informatics and Education in the Era of Social Media and Health Apps. Yearb Med Inform. 2017;26(01):172–177. doi: 10.15265/IY-2017-037. from the 2017 YS on E and CH. doi:10.15265/IY-2017-037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.D’Souza RS, D’Souza S, Strand N, Anderson A, Vogt MNP, Olatoye O. YouTube as a source of medical information on the novel coronavirus 2019 disease (COVID-19) pandemic. https://doi.org/101080/1744169220201761426. 2020;15(7):935–942. doi: 10.1080/17441692.2020.1761426. . doi:10.1080/17441692.2020.1761426. [DOI] [PubMed] [Google Scholar]
- 38.Valente PK, Morin C, Roy M, Mercier A, Atlani-Duault L. Sexual transmission of Zika virus on Twitter: A depoliticised epidemic. https://doi.org/101080/1744169220201768275. 2020;15(11):1689–1701. doi: 10.1080/17441692.2020.1768275. . doi:10.1080/17441692.2020.1768275. [DOI] [PubMed] [Google Scholar]
- 39.Suarez-Lledo V, Alvarez-Galvez J. Prevalence of Health Misinformation on Social Media: Systematic Review. J Med Internet Res. 2021;23(1) doi: 10.2196/17187. . doi:10.2196/17187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Enders AM, Uscinski JE, Seelig MI, et al. The Relationship Between Social Media Use and Beliefs in Conspiracy Theories and Misinformation. Polit Behav. Published online 2021:1. doi:10.1007/S11109-021-09734-6. [DOI] [PMC free article] [PubMed]
- 41.Chopra KK, Arora VK. Covid-19 and social stigma: Role of scientific community. Indian J Tuberc. 2020;67(3):284–285. doi: 10.1016/j.ijtb.2020.07.012. . doi:10.1016/J.IJTB.2020.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hossain MM, Tasnim S, Sultana A, et al. Epidemiology of mental health problems in COVID-19: a review. F1000Research. 2020;9. doi:10.12688/F1000RESEARCH.24457.1. [DOI] [PMC free article] [PubMed]
- 43.Torales J, O’Higgins M, Castaldelli-Maia JM, Ventriglio A. The outbreak of COVID-19 coronavirus and its impact on global mental health: https://doi.org/101177/0020764020915212. 2020;66(4):317–320. doi: 10.1177/0020764020915212. [DOI] [PubMed] [Google Scholar]
- 44.Sunir Vuik S, Darzi A. Sentiment Analysis of Health Care Tweets: Review of the Methods Used. JMIR Public Heal Surveill 2018;4(2)e43 https//publichealth.jmir.org/2018/2/e43. 2018;4(2):e5789. doi: 10.2196/publichealth.5789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Kang GJ, Ewing-Nelson SR, Mackey L, et al. Semantic network analysis of vaccine sentiment in online social media. Vaccine. 2017;35(29):3621–3638. doi: 10.1016/j.vaccine.2017.05.052. . doi:10.1016/J.VACCINE.2017.05.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Sharma C, Whittle S, Haghighi PD, Burstein F, Keen H. Sentiment analysis of social media posts on pharmacotherapy: A scoping review. Pharmacol Res Perspect. 2020;8(5):e00640. doi: 10.1002/prp2.640. . doi:10.1002/PRP2.640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Alamoodi AH, Zaidan BB, Zaidan AA, et al. Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review. Expert Syst Appl. 2021;167:114155. doi: 10.1016/j.eswa.2020.114155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Leal-Neto O, Santos F, Lee J, Albuquerque J, Souza W. Prioritizing COVID-19 tests based on participatory surveillance and spatial scanning. Int J Med Inform. 2020;143. doi:10.1016/J.IJMEDINF.2020.104263. [DOI] [PMC free article] [PubMed]
- 49.Vaidyam AN, Wisniewski H, Halamka JD, Kashavan MS, Torous JB. Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape: https://doi.org/101177/0706743719828977. 2019;64(7):456–464. doi: 10.1177/0706743719828977. . doi:10.1177/0706743719828977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pew Research Center Demographics of Social Media Users and Adoption in the United States | Pew Research Center. Published 2021. Accessed May 28, 2021. https://www.pewresearch.org/internet/fact-sheet/social-media/
- 51.Villena F, Eisenmann U, Knaup P, Dunstan J, Ganzinger M. On the Construction of Multilingual Corpora for Clinical Text Mining. Stud Health Technol Inform. 2020;270 doi: 10.3233/SHTI200180. :347-351. doi:10.3233/SHTI200180. [DOI] [PubMed] [Google Scholar]






