Skip to main content
NPJ Digital Medicine logoLink to NPJ Digital Medicine
. 2021 Mar 3;4:41. doi: 10.1038/s41746-021-00407-6

Digital public health surveillance: a systematic scoping review

Zahra Shakeri Hossein Abad 1,2,, Adrienne Kline 1,3, Madeena Sultana 1,2, Mohammad Noaeen 4, Elvira Nurmambetova 1, Filipe Lucini 1,5, Majed Al-Jefri 1,3, Joon Lee 1,2,6
PMCID: PMC7930261  PMID: 33658681

Abstract

The ubiquitous and openly accessible information produced by the public on the Internet has sparked an increasing interest in developing digital public health surveillance (DPHS) systems. We conducted a systematic scoping review in accordance with the PRISMA extension for scoping reviews to consolidate and characterize the existing research on DPHS and identify areas for further research. We used Natural Language Processing and content analysis to define the search strings and searched Global Health, Web of Science, PubMed, and Google Scholar from 2005 to January 2020 for peer-reviewed articles on DPHS, with extensive hand searching. Seven hundred fifty-five articles were included in this review. The studies were from 54 countries and utilized 26 digital platforms to study 208 sub-categories of 49 categories associated with 16 public health surveillance (PHS) themes. Most studies were conducted by researchers from the United States (56%, 426) and dominated by communicable diseases-related topics (25%, 187), followed by behavioural risk factors (17%, 131). While this review discusses the potentials of using Internet-based data as an affordable and instantaneous resource for DPHS, it highlights the paucity of longitudinal studies and the methodological and inherent practical limitations underpinning the successful implementation of a DPHS system. Little work studied Internet users’ demographics when developing DPHS systems, and 39% (291) of studies did not stratify their results by geographic region. A clear methodology by which the results of DPHS can be linked to public health action has yet to be established, as only six (0.8%) studies deployed their system into a PHS context.

Subject terms: Diseases, Public health

Introduction

Internet technology is now a part of almost everyone’s life. Internet usage among US adults has steadily been increasing from 52% in 2000 to 90% in 20191. Today, 97% of Internet users worldwide are active on social media, and the number of social media accounts per average Internet users has grown from 6.2 in 2015 to around 8 in 20192. The low-cost data stream available on social media and other Internet-based sources is increasingly harnessed by clinicians, patients, and the general public to disseminate insights into disease trends and promote healthy lifestyles and health policies3,4. Every minute, people around the world are publicly sharing volumes of personal and communal health information on different digital platforms5, such as social media, discussion forums and blogs, and Internet search engines. Digital surveillance data, inspired by the definition of digital epidemiology data by Salathé6, is the publicly available user-contributed data not generated with the primary goal of surveillance. This data can provide an inlet to impervious populaces and has become integral to digital public health surveillance (DPHS). Public health surveillance (PHS), as a tool for monitoring and targeting interventions7, is the ongoing systematic collection, analysis, and interpretation of data, tightly integrated with the timely dissemination of these data to those who can undertake effective prevention and control activities8,9. Apart from the unprecedented volume of digital data, when used appropriately, these online resources can provide an increasingly clear picture of the dynamics and complexities of traditional PHS processes5,10. Compared to the data captured through traditional PHS channels, digital resources contain information that can be harnessed to reduce the time to outbreak detection, add more transparency to outbreak information published by the governments, and facilitate public health (PH) responses to emerging diseases and population-related risk factors10. These resources can be either used for infodemiology–utilizing digital data for mining, analysis, and information aggregation with the ultimate aim to inform PH and public policy or used for infoveillance– infodemiology methods with the main focus on surveillance11. Infodemiology was first formally introduced by Gunther Eysenbach in 2002 to describe the distribution of health information and misinformation on digital platforms12 and was later extended to other areas of utilizing digital data for PH research, such as outbreak detection, substance use, and drug utilization13.

The interactivity of the Internet and the highly networked, hyperlocal, and contextualized nature of digital data offer an unparalleled opportunity for the public, patients, and health officials alike to communicate and address health issues. Profiling vaccine criticisms14, mining patient’s narratives about drug experiences on open-access forums15, geospatial tracking of the population during disease outbreaks, providing local and near real-time information to recognition of an outbreak16,17, and population-based clustering of behavioural risk factors such as physical inactivity, substance use, and poor diet in large population18,19 are examples of realizations of such opportunities.

Effective DPHS requires an understanding of the potentials and pitfalls of digital data for monitoring PH and exploring disease dynamics. Several narrative reviews of the application of digital media in PHS and epidemiology have been published2026. Bernardo et al. reviewed 32 studies published between 2002 and 2011 that utilized search queries and social media data for infectious diseases surveillance20. The authors concluded that even though there are challenges associated with the quality of digital data, there have been successful applications of digital disease surveillance since 2006 and their performances in terms of cost, time, and accuracy compare favourably with those of traditional surveillance systems. This was confirmed by a recent scoping review on using web-data for disease surveillance and epidemiology in which Mavragani studied 338 articles from 2009 to 2018 and highlighted the potential of digital surveillance in health informatics research26. Newer reviews on this subject have dealt with the popularity of different surveillance domains over time and summarized recent methodological developments mapped to each domain27,28. The most recent and extensive digital surveillance review28 has pictured a timeline, tracking interest online for PH and solely focused on ethical and validity issues ripe in the digital health monitoring revolution. While the topics covered in our review encapsulate those mentioned, this review will expand on the notion of DPHS by exploring more platforms and a broader context within the PH field. Moreover, a systematic evaluation is absent in the existing reviews, and most encapsulate only certain platforms or diseases/disorders. Therefore, we aimed to provide a comprehensive synthesis of evidence to add to the extant literature filling both of these needs while providing a proportional topic saturation level. Our scoping review also provides details on utilizing digital media in different aspects of PHS. This allows future researchers to identify where the need for future work is ripe and what untapped potentials need more attention in the digital surveillance sphere.

Results

To identify literature on DPHS, we conducted an iterative systematic search with extensive hand searching. Our scoping review was designed, implemented, and reported following the Preferred Reporting Items for Systematic Reviews and Meta Analyses Extension for Scoping Reviews guidelines (PRISMA-ScR)29. While there are other well-established guidelines for conducting systematic scoping reviews3032, the detailed reporting guideline, demonstrative examples, and best-practices for large-scale scoping reviews provided by PRISMA-ScR were ideal for our review. The search yielded 4249 articles. Excluding duplicates, we found 2907 studies from which we selected 755 studies of 16 PHS themes, associated with 49 PH categories and 208 sub-categories (Fig. 1). The complete list of included articles is provided in Supplementary Note 5 (a1–a755).

Fig. 1. Flow diagram.

Fig. 1

The overall process of article selection following PRISMA-ScR guideline.

Table 1 lists all PHS themes, their corresponding (sub)categories, and the relevant articles. These themes include behavioural risk factors (BRFs), cancer, chronic disease, communicable diseases, paediatric health, drug utilization, food and nutrition, health practices, health services, environmental hazards, mental health, mortality, vaccine, and urogenital/preconception. Articles that did not coincide with these topics but dealt with PHS were subsumed under the ‘others’ category (e.g., occupational safety). Each paper was contextualized based on the theme it was most closely affiliated with (i.e., BRFs for smoking behaviours and mental health for suicide, depression, bipolar, or eating disorders). More than one context was permitted to capture topics that would fit into two categories (i.e., eating disorders were placed in both the mental health and the chronic disease categories). Many papers harnessed digital data to study the quality of health services; a category was created to reflect this. While those affiliated with health education/campaigns and communication were placed in a communication subgroup and those involving emergency departments, nursing homes, and other health services were grouped in the accessibility and the quality subgroups.

Table 1.

The hierarchy of public health-related themes studies by the included articles in this review.

Public health themes Public health category Public health sub-category
Behavioural risk factors Smokinga1a53 E-cigarette/JUULa9a36, LCCa37a41, Hookaha17, a42a45, Water-pipea47, a48, Heat-not-burna49, a50, E-liquida51a53
Lifestylea54a89 Dieta62–a66, a68, Physical Activitya64, a67a71, Weight lossa72, a73, Local healtha74a82, Fitnessa83, a84, Sleep disordersa85a87, Sexual healtha88, a89
Substance usea90a123, a123a127 Alcohola91a105, Cannabis/Marijuanaa102, a106a123, Dabbinga124, a125, Mephedronea126
Harassmenta128a133 Sexuala128a130, (Cyber)bullinga131, a132, IPVa133
Cancer Mortalitya134 Breasta134, Lunga134
Preventiona135a141 Cervicala135, a136, Skina137a140, Lunga141
Awarenessa142a165 Breasta145a153, a157, Acute lymphoblastic leukaemiaa154, Dieta155, a156, Smokinga158, Prostata148, a151, a159, HNPCCa160, Lunga161, a162, Cervicala166, Skina163, Colorectala164, Genitourinary malignanciesa165, Ovariana150
Behavioural measuresa166a177 Throata170, Breasta175, Skina175, Melanomaa175, Prostata175, Screeninga166, a176, Pancreatica177
Chronic disease Generala178a193 Diabetesa180a182, a184a189, Third molara190, Molar incisor hypomineralization (MIH)a193
Musculoskeletala194a198 Scoliosisa194, Restless lega195, Osteoarthritisa197, Gouta198
Eating disordera199a204 Obesitya199a201, a203, Diabetesa201, a202
Cardiovasculara157, a178, a205a211 Cardiac arresta205, Heart diseasea157, Oral anticoagulantsa206, Vasculitisa207, Hypertensiona178, a208, Heartburna209, Venous thrombosisa210
Skin diseasesa212a215 Psoriasisa213, a214, Pruritusa215
Lung diseasesa216a220 COPDa216, a217, Asthmaa218a220
Neurologicala142, a221a236 Epilepsya222a227, Willis-Ekboma228, Glaucomaa229, a230, Multiple sclerosisa231, a232, Tinnituesa233, a234, ALSa142, a235, Fibromyalgiaa236
Gastrointestinala237a239 Oesophageala238, Crohn’s diseasea239
Autoimmunea240a243 Systemic Lupus Erythematosus (SLE)a240a242, Rheumatoid arthritisa243
Communicable diseases Outbreaksa62, a244a376 ILI/Influenzaa62, a245a318, Dengue fevera301, a319a328, Ebolaa330a346, Zikaa347a366, Avian Influenzaa367a369, Norovirusa371, a372, MERSa373, a374, Chikungunyaa375, a376
Sexually transmitteda377a394 AIDSa377a379, HIVa380a389, HPVa390, Syphilisa392a394
Infectious diseasesa271, a395a433 Clostridium difficilea401, Meningitisa403, Measlesa404a407, TBEa408, Polioa410, a411, Guillain-Barréa413, Tuberculosisa415, HFMDa416, RSVa417, Scarlet fevera418, Plaguea419a421, Choleraa434, West-nile virusa422, Pertussisa271, a423a426, Candida aurisa427, Lymea428a430, Mayaro virusa431, Malariaa432, Hepatitisa433
Paediatric health Awarenessa322, a435a437 DSFCsa322, Paediatric Fevera435, SIDSa436, Obesitya437
Birth defectsa438a440 Pharmacoepidemiologica438, Intrauterine growth restriction (IUGR)a440
Generala441, a442 Accidenta441, Chicken poxa442
Drug utilization Awarenessa443a452 Anabolic-androgenic steroid (AAS)a448, Alternative medicinea449, Stem-cell therapya450, Codeinea451, Antiretrovirala452
Drug safety/side effectsa453a464 Statinsa456, Illicit Pharmaciesa457a459, Bisphosphonatea460, Psyclonea461, Zolpidema462, Antimicrobial stewardshipa463
Adverse reactiona277, a465a484 Atorvastatina473, Psychiatric drugsa475, Glucocorticoid-relateda480, HIVa481
Drug abusea485a503 Opioida485a495, Fentanyla496, Heparinoida497, Recreationala498, a499, Adderalla500, Antidepressantsa501, Sea salta502
Post-marketinga504a506 Sitagliptina504, Antidepressanta505, Opioida506, Loperamidea503
Food and nutrition Food safetya507a517 (Un)healthya509a511, a517, a518, Legislationa512, Food poisoninga513, a514, Food-borne illnessesa515, a516
Generala519a523 Marketinga520, a521, a523, Online recipesa522
Health practices Outcomesa524a527 Rejuvenationa524, Breast reconstructiona525, Tanninga526, a527
Generala231, a528a537 Dietary supplementsa530, a531, Sunburna533, Physical therapya534, Organ donationa535, Bariatric surgerya536, Plastic surgerya537
Health services Quality assessmenta333, a538a549 Nursing carea333, a539, a540, Hospitalsa541a543, Emergency departmentsa544, a545, Dermatologic carea546, Surgerya547, a548, Radiologya549
Accessibilitya550a553 Emergency departmentsa550, a551, Physical therapya553
Health communicationa93, a554a584 Awarenessa93, a555a567, a571, Patient supporta568a578, Health reformsa579a581, Crisisa582, Heat alerta583, outbreak alerta584
Environmental Pollen countsa585a596 Seasonal Allergic Rhinitisa585a590, Epistaxisa591, Air pollutiona592a595, Sinusitisa596
Syndromica597a601 Heat wavea597a601
Water qualitya602a605 Fluoridationa602a604, Leada605
Disaster/Crisisa606a608 Winter Storma606, Tornadoa607, Earthquakea608
Mental health Generala62, a609a644 Suicidea612a626, Post-Traumatic Stressa628, Depressiona62, a629a637, Stressa638, a639, Bipolara640, a641, Lonelinessa642, a643, OCDa644
Emotion analysisa645a650 Disaster/crisisa646a648, Outbreaksa649, a650, Suicidea651
Stigmaa644, a652a656 Suicidea652, a655, Anxietya653, Self-harma656
Neurodevelopmentala637, a657a668 ADHDa657, ASDa658, Schizophreniaa637, a659a663, Dementiaa664a667, Psychotica668
Eating disordera669, a670 Anorexia nervosaa669
Mortality Generala61, a671a675 Awarenessa671, Socio-demographicsa672, Perinatala673, Strokea674, Accidenta675
Behavioural factorsa489, a676, a677 Substance usea489, a676, Suicidea676, Social activitya677
Vaccine Decision makinga678a708 Paediatrica688a691, HPVa692a704, Influenzaa705, Herper Zostera706, Polioa707, Measlesa702
Adverse eventa709, a710 Influenzaa709, Anxiety-relateda710
Coveragea329, a711, a712 Influenzaa329, HPVa711, a712
Awarenessa713a722 HPVa713a718, Flua719, Rotavirusa720, Measlesa721, Autisma722
Urogenital/Preconception Genitala62, a723a729 Abortiona723, C-sectiona725, Pregnancya62, a726a728, Morcellationa729
Renala730a732 Kidney stonea730, a731, Dialysisa732
Urinarya733 Urinary Tract Infection (UTI)a733
Others Toothachea734a737 Teathinga737
Sexual dysfunctiona738, a739 Peyroniea738, Ejaculatory dysfunctiona739
Animal healtha740, a741 Slaughterhousea740, Marine littera741
Disease burdena742, a743 Skin diseasesa743
Occupational safetya744a747 Chemical Poisoninga744, Accidentsa745, Silicosisa746, Injuriesa747

An article could be linked to only the ‘category’ column if it did not address any sub-categories listed in the sub-category column.

TBE tick-borne encephalitis, DSFC delayed subaponeurotic fluid collections, ADHD attention deficit hyperactivity disorder, HNPCC hereditary non-polyposis colorectal cancer, HFMD hand, foot and mouth disease, RSV respiratory syncytial virus, OCD obsessive compulsive disorder, IPV intimate partner violence, ALS amyotrophic lateral sclerosis, SIDS sudden infant death syndrome.

The surveillance theme with the most number of publications was the ‘communicable disease’ surveillance at 25% (187). The stark rise in the volume of communicable disease publications coincides with the 2016 Zika outbreaks. In 2016, ILI-focused studies were the most common ‘communicable disease’ studies (53%), following a similar distribution to the overall trend of all such studies. In 2017, Zika-focused studies were the most common (36%). Publications in 2017 saw a greater variety of health events studied (Fig. 2).

Fig. 2. The most frequently addressed PHS themes.

Fig. 2

The temporal trends of the two most prevalent themes of DPHS systems in the literature.

A large proportion of BRFs studies can be linked to policy changes. The peak of e-cigarette publications in 2016 and 2017 (Fig. 2) may be attributed to growing international concerns in the preceding years as policymakers noticed vaping products marketed towards youth and young adults. A congressional report in the USA33 and the WHO FCTC34, both in 2014, may have prompted increased research in this area in subsequent years. Similarly, the sudden academic interest in cannabis research in 2016 may result from the rapid legalization and decriminalization of medicinal and recreational cannabis in the preceding years (Fig. 2).

Countries, affiliations, and surveillance systems

A total of 79% (593) of the studies included in this review were published by researchers from the USA (426), UK (51), Australia (44), Canada (36), and Italy (36). The most common surveillance theme researched among these countries include communicable diseases, BRFs, chronic disease, drug utilization, and mental health (Fig. 3a).

Fig. 3. The distribution of studies based on country and affiliation, mapped to different PHS themes.

Fig. 3

a Top five countries and PHS themes. b The frequency of different combinations of affiliations, PHS themes and the average number of authors per country.

More than 94% (707) of the studies involved authors affiliated with academia, from which 460 studies are only academia affiliated. Only 3% (23) of studies have an author affiliated with governments, with ten of them studied communicable diseases, and three studied the general aspects of PH (Fig. 3b). None of these studies investigated the vaccine, environmental hazards, or health practices surveillance systems. The studies utilized datasets with no geographic focus (36%, 268) are dominated by BRFs, communicable, and chronic diseases. The majority of studies with geographically focused datasets used country-level data, and only 0.7% used ZIP-code level datasets. The studies in this category are dominated by communicable diseases, BRFs, and health services surveillance systems (Fig. 3b).

Social media platforms and surveillance systems

Starting from 2005, the three most common digital platforms studied were, in descending order, Twitter, Google Trends, and Facebook. Their numbers increasing sharply from less than three studies per year in 2009 until reaching 78, 49, and 13 studies, respectively, in 2019. Google (Flu) Trends (GT and GFT) are utilized by 41% (76) of publications on communicable diseases, among which 57% (43) of studies aimed to predict outbreaks and seasonal diseases. From 69 studies that utilized Twitter to study communicable diseases, 32% (22) mined tweets for outbreak prediction. Facebook, Instagram, and YouTube were mainly utilized to study BRFs, focusing on smoking, substance use, and lifestyle. Fifty percent of studies that used Yelp investigated topics related to ‘health services’, while this number for Facebook, YouTube, Instagram, and GT is less than 2% (Fig. 4). Almost half of the studies on ‘mental health’ used Twitter data, and 11 studies used GT to observe the seasonal patterns of internet search volume in a wide range of mental health terms. More details about the digital platforms used by the included studies are presented in Supplementary Note 3.

Fig. 4. The temporal trend of surveillance domains associated with a cross-tabulation of surveillance domains and social media platforms (darker shades represent smaller values).

Fig. 4

Surveillance systems that utilized more than one platform were assigned to multiple, and the maximum allowed being five. Studies that investigated more than five platforms are mapped to the ‘Social Media Platform’ column.

Methods—data collection duration

There was a wide variability in data collection duration (Fig. 5). Overall, 36% (268) of the included studies had a duration of more than 2 years, 14% of such studies had a duration of 1–2 years, and 40% of studies had a duration of less than 1 year, with a greater proportion covering less than 6 months. All surveillance themes followed similar distributions, with some notable exceptions: 53% of chronic disease publications had a duration greater than 2 years, while this number for communicable diseases and BRFs themes is 44% and 21%, respectively. Notably, urogenital publications had the shortest duration of data collection, with 34% lasting less than 1 month. Indeed, from Table 1, the associated PH categories (i.e., genital, renal, and urinary) are events with a typically short onset and duration. Moreover, 98% (740) of studies implemented their analysis based on secondary data—the longitudinal data that are sometimes collected months or years after the event occurred35. Thus, surveillance systems that are developed based on secondary data analysis are more useful for long-term rather than short-term interventions35.

Fig. 5. Data collection duration.

Fig. 5

The differences in data collection duration across included studies and the proportion of articles within each time frame across all surveillance systems.

Methods—objectives, data analysis, and findings

We classified the studies based on their overall data collection and analysis methodology (Fig. 6). Studies with the main focus on mining, analysis, and information aggregation to inform PH and public policy were placed in the infodemiology category (77%). Studies that emphasized surveillance were classified as infoveillance (23%)11. Not surprisingly, 112 (60%) of publications on communicable diseases are infoveillance studies. This could be because of the great potentials of the existing digital data such as search queries and access logs to explore the public’s digital behaviour and detect epidemic outbreaks. The main objectives of infodemiology publications were to mine user’s status updates (O13, 32%), and the most common finding was providing baseline data (F16, 23%). Conversely, the infoveillance studies were dominated by the ones that showed the predictability (F13, 28%) and applicability (F1, 22%) of digital data for outbreak detection (O14, 31%).

Fig. 6. The top charts illustrate the mapping between PHS topics and objectives [O], and findings [F] of their corresponding studies, the frequency of infoveillance/infodemiology studies for each topic, and the techniques used by the included publications to evaluate the effectiveness of their proposed approach in addressing the key objectives of a surveillance system.

Fig. 6

The bottom charts represent the temporal trends of data analysis used by the included studies and the frequency of articles that identified each of the age/gender/place in their datasets.

Objectives and findings

From the manual content analysis of the objectives and findings of the included studies, eighteen distinct strands of investigations emerged. ‘Providing baseline information’ on risk patterns and trends in the occurrence of various health events (22%, 163), exploring the ‘applicability’ of utilizing web-based platforms in PHS systems (13%, 98), and ‘identifying user’s digital behaviour’ for evaluating the correlation between online activity and incidence and temporal trends of risk factors (11%, 84) are the top three (Fig. 6).

Detecting unhealthy advertisements (O1) is the second most frequent objective associated with BRFs publications, with 89% (16) of them related to smoking (69%: e-cigarette/JUUL and LCC). Seventy five percent (12) of these publications showed the prevalence of advertising smoking behaviour (F14), and 19% (3) explored the marketing strategies used by smoking vendors (F10). This implies the utilization of digital resources as marketing platforms for different smoking brands, which may carry major PH risks (Fig. 6). Exploring public opinion (O5) and sentiment (O6) towards immunization are the most common objectives in the publications on vaccine surveillance (48%, 23). These objectives are mainly mapped to supportive attitudes (F18) and negative sentiments (F12), respectively. These findings imply the need to design and implement appropriate educational information tailored to different social media platforms, with the main focus on the users who are at risk of excessive exposure to anti-vaccine information. For example, men are far more likely to express a negative opinion about HPV immunization than womena695, or users who are more often exposed to negative opinions about HPV vaccines are more likely to post negative messages subsequentlya697.

Twenty one percent (13) of publications on drug post-marketing/utilization reported on the applicability (F1) of using Internet-based data in exploring drug safety/adverse drug reaction (ADR) (85%), post-marketing (8%), and drug abuse (7%). Interestingly, two studies showed that Twitter might not be a useful platform for this system, as the ADR reports on Twitter usually underrepresent specific drugs and often do not meet the FDA criteria required for reporting an ADRa468, a476. This is in line with a recent systematic review that shows the prevalence of ADR reports on social media varies from 0.2% to 8% of all postings36. Sixty three percent (19) of mental health studies reported risk indicators (F7), from which 73% (14) were related to self-harm or suicide attempts. Applying linguistic analysis methodsa652, exploring time-varying features related to suicide risk factorsa625, mapping digital behaviour of different age groups to these indicatorsa610, a622, and emotion analysisa645 are sample exploratory techniques discussed by the publications in this category. In oncology, exploring the digital behaviour of users (F4) can be used to identify temporal trends of cancer risk factor queries, cancer incidence and mortality, and interests in cancer screening, compared to other information-seeking domains37. Thirty eight percent (5) of studies placed in the [Cancer/F4] category used GTa167, a169, a170, a175 and Yahoo Buzz Index (YBI)a168 to conduct search-based cancer surveillance and 23% (3) mined user-generated content (O13) on Twittera161, a171, a173 to study cancer information-seeking behaviours and the incidence of some types of cancer.

Age/gender/place and temporal trends of data analysis

Given the primary purpose of surveillance is the monitoring and assessment of the overall health status of population subgroups9, analyzing time, demographics (age, gender), and place is a critical component of any PHS system35. Since the rise of Internet-based data usage in PHS, great strides have been made in identifying place, gender, and age from anonymous self-reported information on the Internet. Mining users’ profile informationa37, a199, content analysisa132, a162, a727, population surveya318, a508, mapping to local demographic dataa630, and utilizing third-party toolsa120, a201 are some sample techniques used by the studies included in this review to explore these variables. However, relatively few studies have systematically incorporated these epidemiologic parameters in their data analysis, despite the value of these indicators in identifying risk groups (Fig. 6). Moreover, it is worth noting that questions of validity, mis-classification of users38, and under-counting caused by sampling bias39 are challenges that still need to be addressed. The data analysis of 61% (460) of studies reflects the results of a specific time window, which, excluding communicable diseases, is the most common type of temporal analysis in all reported surveillance systems. Conversely, temporal analysis of the ‘epidemic occurrence’ of a disease and ‘seasonal patterns’ have been the commonly used inferential analytic approaches in analyzing communicable diseases data (Fig. 6). Thirty-two percent (242) of studies did not capture any of the age/gender/place variables for their data analysis, with the majority of them coming from the BRFs category.

Evaluation of the surveillance system

Seventy-four percent (561) of studies evaluated the usefulness of their proposed DPHS system by drawing a mapping between the system’s objectives and outcomes. Among these, 361 (48% of total) studies were evaluated subjectively, 116 (15%) used quantitative methods such as statistical analysis and machine learning (ML) techniques, and 85 (11%) used surveys/qualitative analysis methods. Twenty-five percent (192) of studies used the ‘representativeness’ approach to explore the extent to which the characteristics of reported events can accurately represent the incidence of actual health events40 (Fig. 6). About two-thirds (64%, 120) of the articles on communicable diseases used this approach, followed by studies on environmental hazards (43%, 10). Given that the rate calculation (e.g., seasonal/cyclic incidence of a health event) required for measuring the inclusivity of a system needs an entirely separate data system maintained by an external agency (e.g., Centers for Disease Control and Prevention (CDC) ILI data), utilizing this approach might be more challenging for the other surveillance systems.

Data types and analysis methods

Figure 7 summarizes the frequency of different data types used by the included studies, their mapping to different PHS themes, and the proportion of the studies that applied ML techniques to process each data type. Textual data are the category with the highest number of ML applications (31%), and none of the studies that utilized video data used ML. This meagre rate, of course, reflects the fact that there are several pitfalls to the process of analyzing Internet-based data. ‘Search queries’ is the second most frequent data type. Given its popularity, considerations must be given to the limitations of search query analysis, such as the dynamic changes of health information-seeking behaviour, the uncertainty of information seeker representativeness (e.g., some searches may be generated by bots or news reports), and the limited geographic data that can be gleaned from this data type.

Fig. 7. Data types and analysis methods.

Fig. 7

The mapping between data types used by the included studies and the PHS systems, platforms, and the use of machine learning.

Discussion

Key findings

We report a comprehensive scoping review to summarize and synthesize evidence from a large and heterogeneous body of literature studying DPHS. The growing body of evidence of DPHS reflects the chronological availability of new digital platforms and new data mining and ML techniques. Our findings show the huge effect of mass media on the public’s information-seeking behaviour. Exploring these behaviours can help PH officials tailor their messages to address PH interests and improve healthcare delivery.

Digital data can help portray the dynamics of PHS systems and allow PH professionals to pinpoint the general concerns or needs of the public during infectious disease events to create location-specific campaigns. For example, the finding that there is no association between dental caries and toothache-related information-seeking behaviours among South American Google users can reinforce the unfamiliarity of this population about the relationship between dental pain and the final stages of chronic oral diseasesa735.

Our findings show a higher prevalence of digital surveillance systems for communicable diseases (25%, 187). One possible reason for this is that topics such as seasonal outbreaks and epidemics, sexually transmitted and infectious diseases, can be coalesced in this category, making it a far-reaching one. Another reason may be the ease of using relative search volumes for various outbreak-related and infectious diseases using Google Trends, access logs on other social media platforms, as well as the fear/hype surrounding infectious diseases and different epidemics such as H1N1, Ebola, and Zika. Very few papers dealt with ‘disease burden’ (0.3%) and ‘occupational safety’ (0.5%), which came as a surprise given the excellent availability of Google Trends data.

The surveillance themes studied by each country appear to follow international trends (Fig. 3a). Interestingly, the USA and Australia had a greater proportion of articles studying BRFs, which can be attributed to international differences. For instance, according to the UN World Drug Report (2016), the prevalence of cannabis users in the USA and Australia in 2015 surpassed that of the European average by roughly 4%41. Although cannabis remains the most commonly used illicit drug in both countries, Australia has seen a drastic rise in the use of amphetamines and other illicit drugs since 2012. The USA holds the largest market for e-cigarettes. Also, it has the most reported vaping-related illness, particularly in young people. Furthermore, both countries have significantly more overweight or obese people. Recent reports show that 67% of Australian adults and 71% of American adults (over the age of 20) are overweight. Indeed, these factors, combined, may contribute to increased research in smoking, lifestyle habits and illicit substance use, which in turn increases the proportion of behavioural risk factor publications.

While the use of user-generated information on the Internet certainly shows promises, especially from the standpoint of providing an alternative and inexpensive solution to PHS, questions remain regarding the validity and generalizability of social media and Internet data28. Given the limited length of data (e.g., a tweet), different language styles between Internet users, and no restriction on their writing style, user-generated content often contains a high amount of noise, making the automatic information extraction and classification of free-text data challenging and time-consuming. Moreover, many concerns have been raised about the correctness and the quality of health-related digital data and the detrimental effects that misinformation can have on PH42. This concern with misinformation was also apparent during the 2014 Ebola outbreaka335 or the Zika outbreak in 2016a354, a357, a359, a366. Table 2 lists the included studies that investigated the spread of inaccurate or incomplete health-related information on the Internet. The number of studies in this category increased from 21 in 2015 to 60 in 2019, with a spike in 2017, comprising 8% of all included studies. Digital misinformation can quickly spread but difficult to refute. As listed in Table 2, the majority of research on PH-related misinformation has focused on communicable diseases, and BRFs surveillance systems and most of the reported misinformation by the included studies have proliferated via Twitter, news websites, and Facebook, respectively. Sixty-seven percent (40) of these studies analyzed textual data, and 18% (11) contained video data. Among the studies without geographic focus, the investigation is dominated by those of drug utilization, chronic diseases, and vaccines, respectively. Interestingly, studies that investigated misinformation in a specific geographical zone mainly focused on BRFs, communicable diseases, and health services surveillance systems. Despite this long-standing effort, there is still a clear need for a valid assessment of the potential for harm associated with digital health misinformation and its relative impact for different surveillance systems.

Table 2.

Studies that detected inaccurate or incomplete information in the context of DPHS, mapped to various PHS themes/categories and digital media platforms. [FB]: Facebook, [NW]: News Websites, [SW]: Specific Websites, [YA]: Yahoo Answers, [WA]: WhatsApp, and [YT]: YouTube.

Surveillance System (n) Subgroup FB Forums GT NW Reddit SW Twitter WA Weblogs Weibo Wikipedia YA YT
Public health(10) General a748 a749 a750, a751 a752
Disease comparison a753
Dental a754 a754 a754 a754
Behavioural risk factors(17) Smoking and genetic a46 a46 a46
E-cigarette a27 a9, a27 a27 a27
Alcohol a103 a93 a91, a103 a93
Cannabis a107, a119 a117
Cancer(4) Breast a153
Diet a155
Awareness a165 a165
Drug utilization(8) General a755
ADR a478 a468
Psyclone a461
Awareness a444, a445
Alternative medicine a449
Stem-cell therapy a450
Paediatric health(3) DSFCs a322
IUGR a440 a440
Chronic diseases(5) Obesity a203
COPD a216
Heart disease a157
Hypertension a208
Scoliosis a194
Communicable diseases(12) Zika a354, a359 a357 a366 a359
Avian influenza a369
Food-borne illnesses a516
Clostridium difficile a401
HPV a390 a390
Ebola a335
Lyme a430
Reproductive health(2) C-section a725
Pregnancy a727
Health communication(5) Knee arthroscopy a563
Suicide a562
Tinnitus a233 a233 a233
Mental health(2) ADHD a657
Psychotic a668
Vaccine(4) HPV a713, a714 a697
Decision making a683
Environmental(4) Water fluoridation a602 a602 a602 a602
Food and nutrition(1) General a519
Health practices(2) Rejuvenation a524 a524
Mortality(3) Awareness a671 a671 a671
Occupational safety(1) Brain injury a747

Limitations of the included studies

First, we found that 61% (460) of studies conducted cross-sectional analysis (Fig. 6), and thus they were unable to evaluate the longitudinal or temporal dynamics of their findings. These findings might change over time, and longitudinal analysis would be needed before being utilized by PH decision-makers. Ten percent (75) of studies did not even report the time scale of their analysis and only reported the analysis results. Even if the temporal analysis is unrevealing, the usefulness of a PHS system needs to be assessed periodically to ensure that it is serving a useful PH function35.

Second, the majority of the studies that utilized digital data for PHS (77%, 581) had an exploratory nature and attempted to gather information and data to inform PH officials about the potential of DPHS in different areas of PHS (Table 1). Among these studies, 28% (165) provided baseline data (F16 in Fig. 6), 17% (98) investigated the applicability and feasibility of digital data for PHS (F1), and 28% (163) studied users’ digital behaviour and their concerns and opinions about different aspects of PH (F4, F6, F12, and F18). While these studies provide some valuable information on the potential of DPHS, they represent only the first three steps of a PHS process (i.e., planning&design, data collection, and data analysis, Fig. 8) and are limited in real-world evaluation (i.e., sensitivity and representativeness analysis) and system deployment.

Fig. 8. The overall iterative process of a public health surveillance system.

Fig. 8

The coloured phase in red highlights the key difference between traditional and digital public health surveillance. The summary of current limitations of research on DPHS discussed throughout this review, is mapped to and listed below each activity of the process.

Third, around 40% (299) of studies were limited by sample size and scope, as they used labour-intensive methods such as manual coding and qualitative analysis. The majority of the 219 studies that applied NLP methods used rule-based and lexical matching techniques such as topic modelling, sentiment analysis, and language modelling. These methods can only extract abstract themes at a high level, and the subjectivity in the interpretation of their results might limit the generalizability and the accuracy of the findings of these studies.

Fourth, the content bias is another limitation of the included studies in our review. User-generated content on the Internet is highly biased as it reflects information that people are comfortable having revealed and may not represent the real spectrum of their feelings/experiences. In addition to this, our study’s results show that among the 554 studies that used text, image, or video data types, only 20% (111) took into account whether their findings were associated with the user’s personal experience (i.e., self-reported) or not. Thus, there is a clear need for studies capable of determining and mitigating content biases that affect the formation and adoption of digital data for PHS.

Fifth, the final link in the surveillance chain is the timely dissemination of the system’s findings to the general public or PH officials for action. Of the articles included in this review, only six (0.8%) linked their results for public health action. While there is a clear need for rigorous methodologies by which the results of DPHS systems can be converted into usable information, vigilance is still needed regarding the efficacy and safety of these findings to forgo the unintended consequences of these results on PH decisions.

Sixth, while the anonymity of Internet users enables individuals with discreditable stigma to reap the benefits of supportive communication on digital media43,44, the difficulty of ascertaining demographics poses several unresolved questions regarding the inherent population biases of Internet users with different cultural background or socioeconomic status. Demographics for most digital platforms are not nationally representative and skewed toward younger age groups and users with higher levels of education45,46. We found that no studies assessed digital media utilization for vulnerable populations (e.g., low-income, older adults, or people with a disability) who are underpresented on different digital platforms. Studies on detecting social bots are scarce. Considering the radically increasing rate of childhood obesity with the subsequent adolescent onset of nutrition-related chronic conditions such as diabetes and cardiovascular diseases47,48, which could be due to the massive exposure of adults and children to unhealthy food and beverages through product placements and promotional advertisements on different digital platforms4951, this topic is vastly underreported by the research on DPHS.

Seventh, among the 379 studies that utilized Twitter, Facebook, and Instagram, 41% (156) confined their analysis to content that was attributed with specific hashtag(s). These studies represent a biased population of users, and they may have skewed the data by excluding contents relevant to the health event under study. Furthermore, from the full-text of the 581 studies that did not use hashtags, we manually extracted the methodologies they employed to query the Internet or filter their collected data and found that the majority (71%, 411) used only their subjective opinion and 10% (57) used the existing literature to define their search keywords. Trend analysis (i.e., Google correlates) and ontology-based keyword extraction were used by 6% (37) and 5%(29) of the studies, respectively. Only 1% (7) of studies used automatic algorithms such as ML, NLP, or lexical analysis to extract context-sensitive keywords. Considering the rapid changes in web search behaviours, the uncertainty regarding the representativeness of pre-defined keywords, and the highly context-sensitive nature of health-related events, keyword querying alone might not be suitable in DPHSa634.

Eighth, furthering the population bias of the social media data, 82% (619) of studies analyzed only one platform, potentially leading to false positives. For example, Twitter content on poliomyelitis differs significantly from other English-language media contenta410. Eighty five percent (638) of studies are limited to English-language content. Given that some of the addressed health-related issues by the included studies may be prevalent in countries other than the USA and countries with large English-speaking populations, the language bias can limit the conclusions to English-speaking populations. For example, the largest burden of cervical cancer is in non-English-speaking countries such as countries in Africa, Asia, and South Americaa135, while only English-tweets were reviewed to study this topic.

Ninth, although the health outcomes of different PHS systems are highly location-dependent and might vary based on local healthcare policies52, the results of 36% (274) of the studies reported in this review were not segmented by geographic location, thus limiting the conclusiveness of their results. For example, while search engine data may be a useful tool to study the temporal dynamics of the pollen seasons in Ukraine and Chinaa587, a595, the agreement between search queries and pollen concentrations in France is usually poora588. Similarly, in studies that investigated drug abuse in the context of varying policies, digital data were shown to be a valuable indicator of drug-related communicationsa114a116, a123. However, this limitation is inherent in some of the digital platforms such as Yelp, Reddit, and WikiTrends as they do not make the location of the poster or visitor readily available. More details about the challenges of using specific digital platforms for different PHS topics are presented in Supplementary Note 4.

DPHS and its challenges

Despite the improvements enabled by digital technologies, the overall process of PHS research has remained constant and contains five main systematic and iterative activities9,53. Figure 8 illustrates the overall process of DPHS and summarizes the limitations of existing research on DPHS discussed earlier by mapping them to different activities of this process. During the course of this review, we found that the main differences between traditional and DPHS lie in how and for what purposes the data are generated and utilized (highlighted in Fig. 8). Following the definition of digital surveillance data used to define the scope of this review, a DPHS system uses digital data voluntarily generated by the public, regardless of the main objectives of the task at hand. Digital data generated through online surveys or polls with a pre-defined surveillance goal or digital content that is not publicly available cannot be considered digital surveillance data. This methodological difference between traditional and digital PHS systems helps explain the challenges mapped to different DPHS activities (listed in Fig. 8). Data source bias (e.g., limited platforms and content/population bias), data collection limitations (e.g., subjective filtering), challenging data analysis due to the complexities of unstructured digital data, and lack of sensitivity analysis for evaluating DPHS systems due to the limitations of mapping digital data to national and real-world data are some of the key challenges that still need to be addressed in future work.

Limitations of the scoping review

This study has some limitations. First, the terminology in the context of DPHS is not yet established in a consistent way, and our search strings may not have captured all the existing evidence. To mitigate this, in addition to a literature review and involving domain experts, we used language modelling and lexical analysis to find the context-sensitive terms that present the field. Second, papers excluded based on our criteria may yet prove relevant to DPHS, despite decisions made by three reviewers. Finally, although we have tried to discuss some of the most important findings in the literature through intuitive and detailed visualization techniques, it is impossible in a limited space to detail all the aspects of the studies utilized digital media for PHS. The supplementary dashboard we present alongside this study presents more interactive results. However, we believe that a more broadly based review of each of the surveillance systems presented in this paper provides necessary contexts for DPHS.

Methods

Search strategy and selection criteria

For this scoping review, we searched Global Health, Web of Science, and PubMed for articles published in English, up to January 2020. For each search string, we also searched the first ten pages of Google Scholar that displayed 20 results per page to ensure we had included all highly cited articles relevant to the scope of our review. To define the search strings for automated search, we used literature review, manual content analysis, and Natural Language Processing (NLP), including language modelling (i.e., the probability of a given sequence of words in a document) and lexical association analysis (i.e., the co-occurrence of words), to explore the context-sensitive terms relating to DPHS (Supplementary Note 1.1 and Supplementary Table 1). The reference lists of the included articles were also screened for additional relevant studies not identified during the automatic search. To assess the performance of the developed search strategy, the sensitivity of more than 200 search strings were tested using a quasi gold standard54 set of 80 articles. These articles were selected manually from studies published in four public health journals from 2017 to 2018 (Supplementary Note 1.2 and Supplementary Table 2).

We included all studies published in English and investigated digital data to implement a surveillance system directly (infoveillance) or mined, analyzed, and aggregated information from digital resources to inform PH and public policy for PHS purposes (infodemiology). Digital data in this paper, regardless of its type, refer to the publicly available user-contributed content on the Internet that was not generated with the main purpose of supporting PHS25. Digital data sources can be categorized into social networking sites (e.g., Facebook, Twitter); Internet search data (e.g., Google (Flu) Trends); collaborative websites (e.g., Wikipedia); content sharing websites (e.g., YouTube, news websites); and blogs and forums (e.g., Reddit, Yelp)55. Thus, we excluded all PHS studies that actively collected data by conducting online surveys, digital polls, and interviews. Moreover, articles that used digital data for personal surveillance (i.e., monitoring potentially exposed individuals to detect early symptoms35) were excluded from this review. We also excluded studies that utilized digital data for purposes other than PHS. For example, studies that reported on leveraging the social structures of digital platforms for health education and research recruitment, or studies that only contributed to developing new ML techniques for PHS were not eligible for inclusion. Full details of the inclusion/exclusion criteria are listed in Supplementary Note 1.4.

The titles and abstracts of the articles identified by the search strategy were manually screened by three reviewers independently for eligibility according to the inclusion and exclusion criteria. Disagreements about eligibility were settled by discussion among the three reviewers. One reviewer manually assessed the full text of included publication and identified additional papers that did not meet the eligibility requirements.

Data analysis

A data extraction form was developed and independently piloted on 50 publications by three reviewers. Seven reviewers extracted data from the included articles and two reviewers manually reviewed all fields of the data extraction form and resolved discrepancies by reviewing the full text of the included studies. The following data were extracted from the included papers: authors’ affiliation, number of authors, year of publication, country of authors, country of data collection, platform(s) under study, surveillance theme and (sub) category, objective and findings, the temporal trend of data analysis, surveillance type, age/gender/place mapped to the data, the language of data, analysis methods (i.e., quantitative, qualitative, machine learning), data type (e.g., text, image, video, and search query), duration/start of data collection, evaluation methods, and the methodology of using digital resources for PHS.

To summarize the extracted data from the included articles, we used a descriptive-analytical method to extract contextual and process-oriented information from each study56. A qualitative analysis was also conducted using NVivo 1057, a software programme for qualitative analysis, to chart the descriptive results and findings of the included studies. We tabulated a hierarchy of digital surveillance systems reported by the included studies and used narrative visualizations to report the findings of this review. We also developed an interactive visual dashboard (available at https://rpubs.com/zshakeri/dphs_dashboard) to provide insights into the findings with a multidimensional and more granular conceptual structure that is difficult to articulate in text alone. More details about the dashboard are provided in Supplementary Note 2.

As the primary purpose of this study was to perform scientific paper profiling on internet-based user-generated data in the PHS context, we did not critically appraise the methodological quality of the included studies. However, we will comment on the methodological limitations that could have affected their results and implications.

Supplementary information

Acknowledgements

This work was supported by a postdoctoral scholarship from the Libin Cardiovascular Institute and the Cumming School of Medicine, University of Calgary. Also, this work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (RGPIN-2014-04743) and funding from the O’Brien Institute for Public Health, University of Calgary.

Author contributions

Z.S.H.A. led developing and implementing the protocol, designed the search strategy and retrieved articles, designed the data extraction process, screened the search results, extracted data, performed the data analysis, interpreted and visualized the results, developed the dashboard, and led on writing the manuscript. M.N. contributed to the data collection, data analysis, and critically reviewed the results. A.K. and M.S. contributed to the screened search results and extracted data. A.K. contributed to the interpretation of clinical results. E.N., F.L., and M.A. contributed to the extracted data. Z.S.H.A. and M.N. reviewed the extracted data and resolved discrepancies by reviewing the full text of the included studies. J.L. conceived the study, contributed to the protocol development, and critically reviewed the results and the manuscript. All authors read and approved the final manuscript.

Data availability

All data generated or analyzed during this review are included in this article and its supplementary information files.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41746-021-00407-6.

References

  • 1.Center, P. R. Internet/broadband fact sheet. https://www.pewresearch.org/internet/fact-sheet/social-media/ (2019). Accessed on July 2020.
  • 2.Index, G. W. Global Web Index’s Flagship Report on the Latest Trends in Social Media (GlobalWebIndex (GWI), New York City, 2018).
  • 3.Fung IC-H, Tse ZTH, Fu K-W. The use of social media in public health surveillance. Western Pac. Surveill. Response J. 2015;6:3. doi: 10.5365/wpsar.2015.6.1.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Brownstein JS, Freifeld CC, Madoff LC. Digital disease detection—harnessing the web for public health surveillance. N. Engl. J. Med. 2009;360:2153. doi: 10.1056/NEJMp0900702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kass-Hout TA, Alhinnawi H. Social media in public health. Br. Med. Bull. 2013;108:5–24. doi: 10.1093/bmb/ldt028. [DOI] [PubMed] [Google Scholar]
  • 6.Salathé M. Digital epidemiology: what is it, and where is it going? Life Sci. Soc. Policy. 2018;14:1. doi: 10.1186/s40504-017-0065-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jamison, D. T. et al. Disease Control Priorities in Developing Countries (The World Bank, 2006). [PubMed]
  • 8.Thacker SB, et al. Public health surveillance in the united states: evolution and challenges. MMWR Surveill. Summ. 2012;61:3–9. [PubMed] [Google Scholar]
  • 9.Teutsch, S. M. Considerations in planning a surveillance system. Princibles and Practice of Public Health Surveillance 18–28 (Oxford University Press, New York, NY, 2010).
  • 10.Salathe M, et al. Digital epidemiology. PLoS Comput. Biol. 2012;8:e1002616. doi: 10.1371/journal.pcbi.1002616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Eysenbach G. Infodemiology and infoveillance: tracking online health information and cyberbehavior for public health. Am. J. Prevent. Med. 2011;40:S154–S158. doi: 10.1016/j.amepre.2011.02.006. [DOI] [PubMed] [Google Scholar]
  • 12.Eysenbach G. Infodemiology: the epidemiology of (mis)information. Am. J. Med. 2002;113:763–765. doi: 10.1016/S0002-9343(02)01473-0. [DOI] [PubMed] [Google Scholar]
  • 13.Zeraatkar K, Ahmadi M. Trends of infodemiology studies: a scoping review. Health Inf. Librar. J. 2018;35:91–120. doi: 10.1111/hir.12216. [DOI] [PubMed] [Google Scholar]
  • 14.Ward JK, Peretti-Watel P, Verger P. Vaccine criticism on the internet: propositions for future research. Hum.Vaccines & Immunother. 2016;12:1924–1929. doi: 10.1080/21645515.2015.1095415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Freifeld CC, et al. Digital drug safety surveillance: monitoring pharmaceutical products in twitter. Drug Safe. 2014;37:343–350. doi: 10.1007/s40264-014-0155-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Carneiro HA, Mylonakis E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin. Infect. Dis. 2009;49:1557–1564. doi: 10.1086/630200. [DOI] [PubMed] [Google Scholar]
  • 17.Nuti SV, et al. The use of google trends in health care research: a systematic review. PLoS ONE. 2014;9:e109583. doi: 10.1371/journal.pone.0109583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Nicholls J. Everyday, everywhere: alcohol marketing and social media-"current trends. Alcohol Alcohol. 2012;47:486–493. doi: 10.1093/alcalc/ags043. [DOI] [PubMed] [Google Scholar]
  • 19.Naslund JA, et al. Systematic review of social media interventions for smoking cessation. Addict. Behav. 2017;73:81–93. doi: 10.1016/j.addbeh.2017.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bernardo TM, et al. Scoping review on search queries and social media for disease surveillance: a chronology of innovation. J. Med. Internet Res. 2013;15:e147. doi: 10.2196/jmir.2740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sinnenberg L, et al. Twitter as a tool for health research: a systematic review. Am. J. Public Health. 2017;107:e1–e8. doi: 10.2105/AJPH.2016.303512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Velasco E, Agheneza T, Denecke K, Kirchner G, Eckmanns T. Social media and internet-based data in global systems for public health surveillance: a systematic review. Milbank Q. 2014;92:7–33. doi: 10.1111/1468-0009.12038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Fung I, et al. Ebola virus disease and social media: a systematic review. Am J. Infect. Control. 2016;44:1660–1671. doi: 10.1016/j.ajic.2016.05.011. [DOI] [PubMed] [Google Scholar]
  • 24.Capurro D, et al. The use of social networking sites for public health practice and research: a systematic review. J. Med. Internet Res. 2014;16:1–14. doi: 10.2196/jmir.2517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Park H, Jung H, On J, Park SK, Kang H. Digital epidemiology: use of digital data collected for non-epidemiological purposes in epidemiological studies. Healthc. Inform. Res. 2018;24:253–262. doi: 10.4258/hir.2018.24.4.253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Mavragani A. Infodemiology and infoveillance: scoping review. J. Med. Internet Res. 2020;22:e16206. doi: 10.2196/16206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Edo-Osagie, O., De La Iglesia, B., Lake, I. & Edeghere, O. A scoping review of the use of twitter for public health research. Comput. Biol. Med.122, 1–13 (2020). [DOI] [PMC free article] [PubMed]
  • 28.Aiello A, Renson A, Zivich P. Social media—and internet-based disease surveillance for public health. Annu. Rev. Public Health. 2020;2020:101–118. doi: 10.1146/annurev-publhealth-040119-094402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Tricco AC, et al. Prisma extension for scoping reviews (prisma-scr): checklist and explanation. Ann. Intern Med. 2018;169:467–473. doi: 10.7326/M18-0850. [DOI] [PubMed] [Google Scholar]
  • 30.Peters MD, et al. Guidance for conducting systematic scoping reviews. Int. J. Evid. Based Healthc. 2015;13:141–146. doi: 10.1097/XEB.0000000000000050. [DOI] [PubMed] [Google Scholar]
  • 31.Arksey H, O’Malley L. Scoping studies: towards a methodological framework. Int. J. Soc. Res. Methodol. 2005;8:19–32. doi: 10.1080/1364557032000119616. [DOI] [Google Scholar]
  • 32.Levac D, Colquhoun H, O’Brien KK. Scoping studies: advancing the methodology. Implement. Sci. 2010;5:69. doi: 10.1186/1748-5908-5-69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Marynak K, et al. State laws prohibiting sales to minors and indoor use of electronic nicotine delivery systems-"united states, november 2014. Morbid. Mortal. Wkly Rep. 2014;63:1145. [PMC free article] [PubMed] [Google Scholar]
  • 34.Organization, W. H. et al. Electronic nicotine delivery systems. Report by WHO (WHO, 2014).
  • 35.Declich S, Carter AO. Public health surveillance: historical origins, methods and evaluation. Bull. World Health Organ. 1994;72:285. [PMC free article] [PubMed] [Google Scholar]
  • 36.Golder A, G. N, Y. L. Systematic review on the prevalence, frequency and comparative value of adverse events data in social media. Br. J. Clin. Pharmacol. 2015;80:878–888. doi: 10.1111/bcp.12746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wehner MR, Nead KT. Can google help us fight cancer? Lancet Oncol. 2018;19:867. doi: 10.1016/S1470-2045(18)30296-1. [DOI] [PubMed] [Google Scholar]
  • 38.Hahn RA, Stroup DF. Race and ethnicity in public health surveillance: criteria for the scientific use of social categories. Public Health Rep. 1994;109:7. [PMC free article] [PubMed] [Google Scholar]
  • 39.Aiello AE, Renson A, Zivich PN. Social media–and internet-based disease surveillance for public health. Annu. Rev. Public Health. 2020;41:101–118. doi: 10.1146/annurev-publhealth-040119-094402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.German, R. R., Horan, J. M., Lee, L. M., Milstein, B. & Pertowski, C. A. Updated Guidelines for Evaluating Public Health Surveillance Systems; Recommendations from the Guidelines Working Group (MMWR Recomm Rep., 2001). [PubMed]
  • 41.UNODC. World Drug Report (United Nations Office on Drugs and Crime, 2016).
  • 42.Chou W-YS, Oh A, Klein WM. Addressing health-related misinformation on social media. JAMA. 2018;320:2417–2418. doi: 10.1001/jama.2018.16865. [DOI] [PubMed] [Google Scholar]
  • 43.Powell J, Darvell M, Gray J. The doctor, the patient and the world-wide web: how the internet is changing healthcare. J. R. Soc. Med. 2003;96:74–76. doi: 10.1177/014107680309600206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Yeshua-Katz D, Martins N. Communicating stigma: the pro-ana paradox. Health Commun. 2013;28:499–508. doi: 10.1080/10410236.2012.699889. [DOI] [PubMed] [Google Scholar]
  • 45.Kaplan AM, Haenlein M. Users of the world, unite! the challenges and opportunities of social media. Bus. Horiz. 2010;53:59–68. doi: 10.1016/j.bushor.2009.09.003. [DOI] [Google Scholar]
  • 46.Sadah SA, Shahbazi M, Wiley MT, Hristidis V. A study of the demographics of web-based health-related social media users. J. Med. Internet Res. 2015;17:e194. doi: 10.2196/jmir.4308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Sanou D, et al. Acculturation and nutritional health of immigrants in canada: a scoping review. J. Immigr. Minor. Health. 2014;16:24–34. doi: 10.1007/s10903-013-9823-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Smith KB, Smith MS. Obesity statistics. Prim. Care. 2016;43:121–135. doi: 10.1016/j.pop.2015.10.001. [DOI] [PubMed] [Google Scholar]
  • 49.Olstad DL, Lee J. Leveraging artificial intelligence to monitor unhealthy food and brand marketing to children on digital media. Lancet Child Adolesc Health. 2020;4:418–420. doi: 10.1016/S2352-4642(20)30101-2. [DOI] [PubMed] [Google Scholar]
  • 50.Dunlop S, Freeman B, Jones SC. Marketing to youth in the digital age: The promotion of unhealthy products and health promoting behaviours on social media. Media Commun. 2016;4:35–49. doi: 10.17645/mac.v4i3.522. [DOI] [Google Scholar]
  • 51.Potvin Kent M, Pauzé E, Roy E-A, de Billy N, Czoli C. Children and adolescents’ exposure to food and beverage marketing in social media apps. Pediatr Obes. 2019;14:e12508. doi: 10.1111/ijpo.12508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Croner CM. Public health, gis, and the internet. Ann. Rev. Public Health. 2003;24:57–82. doi: 10.1146/annurev.publhealth.24.012902.140835. [DOI] [PubMed] [Google Scholar]
  • 53.Choi BC. The past, present, and future of public health surveillance. Scientifica. 2012;2012:1–26. doi: 10.6064/2012/875253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Golder S, McIntosh HM, Duffy S, Glanville J. Developing efficient search strategies to identify reports of adverse effects in medline and embase. Health Inf. Librar. J. 2006;23:3–12. doi: 10.1111/j.1471-1842.2006.00634.x. [DOI] [PubMed] [Google Scholar]
  • 55.Kaplan AM, Haenlein M. Users of the world, unite! the challenges and opportunities of social media. Bus. Horiz. 2010;53:59–68. doi: 10.1016/j.bushor.2009.09.003. [DOI] [Google Scholar]
  • 56.Levac D, Colquhoun H, O’Brien KK. Scoping studies: advancing the methodology. Implemen. Sci. 2010;5:69. doi: 10.1186/1748-5908-5-69. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Bazeley, P. & Jackson, K. Qualitative Data Analysis with NVivo (SAGE publications limited, 2013).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

All data generated or analyzed during this review are included in this article and its supplementary information files.


Articles from NPJ Digital Medicine are provided here courtesy of Nature Publishing Group

RESOURCES