Skip to main content
Yearbook of Medical Informatics logoLink to Yearbook of Medical Informatics
. 2019 Aug 16;28(1):208–217. doi: 10.1055/s-0039-1677918

Recent Advances in Using Natural Language Processing to Address Public Health Research Questions Using Social Media and ConsumerGenerated Data

Mike Conway 1,, Mengke Hu 1, Wendy W Chapman 1
PMCID: PMC6697505  PMID: 31419834

Summary

Objective : We present a narrative review of recent work on the utilisation of Natural Language Processing (NLP) for the analysis of social media (including online health communities) specifically for public health applications.

Methods : We conducted a literature review of NLP research that utilised social media or online consumer-generated text for public health applications, focussing on the years 2016 to 2018. Papers were identified in several ways, including PubMed searches and the inspection of recent conference proceedings from the Association of Computational Linguistics (ACL), the Conference on Human Factors in Computing Systems (CHI), and the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM). Popular data sources included Twitter, Reddit, various online health communities, and Facebook.

Results : In the recent past, communicable diseases (e.g., influenza, dengue) have been the focus of much social media-based NLP health research. However, mental health and substance use and abuse (including the use of tobacco, alcohol, marijuana, and opioids) have been the subject of an increasing volume of research in the 2016 - 2018 period. Associated with this trend, the use of lexicon-based methods remains popular given the availability of psychologically validated lexical resources suitable for mental health and substance abuse research. Finally, we found that in the period under review “modern" machine learning methods (i.e. deep neural-network-based methods), while increasing in popularity, remain less widely used than “classical" machine learning methods.

Keywords: Natural Language Processing, text mining, social media, public health

1 Introduction

Social media is a valuable source of data for public health research. It is estimated that 75% of Internet users have read or watched online health information content, and 26% of Internet users have posted (or shared) their personal health information online 1 . This large-scale sharing of health information makes social media and Online Health Communities (OHC) a valuable and abundant source of data for addressing public health questions. Social media – including online consumer generated OHC data – provide a ready source of timely, abundant data that can serve as a valuable resource for several broad types of public health applications, including surveillance, health communication, sentiment analysis, and understanding the natural history of a disease, injury, or health behaviour. Research on utilising social media in conjunction with Natural Language Processing (NLP) for public health applications is a robust and growing area of study, with dedicated meetings 1 and a now well-established research community 2 . Regarding surveillance, the importance of mental health and substance abuse surveillance is increasingly recognised 3 . This growth is unsurprising given that it is estimated that mental health and substance abuse constitute approximately 10.4% of the global burden of disease and are the leading cause of years lived with disability, imposing direct and indirect costs on the world economy of around US$2.5 trillion 4 . The study of health communication is another area of research that uses social media in conjunction with NLP methods, particularly in the area of understanding and quantifying vaccine hesitancy and refusal. NLP can support public health researchers in identifying common health-related misconceptions, and in turn, devising more effective health communication methods 5 . Similarly, sentiment analysis with respect to products relevant to public health (e.g. marijuana-related products, e-cigarettes) and the health behaviours that they facilitate is a further area of research 6 . Finally, social media provide a valuable data source for studies focussed on understanding and analysing the natural history of a disease, illness or injury, especially in the context of new and re-emerging diseases and rapid changes in health behaviour 7 .

The key changes we have observed since 2016 – apart from the growth in research related to mental health and substance abuse and the increasing interest in “modern” machine learning methods–include a move towards integrating social media analysis with the Electronic Health Record (EHR) 8 , in part as a means of obtaining valuable diagnostic “ground truth”. A further shift of note is the increased interest in elucidating ethical issues in the application of NLP (and machine learning more generally) to social media for public health applications, particularly with respect to protecting the rights of those users suffering from potentially stigmatising conditions 9 .

Challenges in developing high performance NLP methods for social media have been extensively enumerated, but in summary, major outstanding problems include the use of non-standard grammar, the use of rapidly changing and often non-standard slang terms , spelling variation in informal consumer-generated text, the rapidly changing nature of social media language, and finally the identification (and filtering) of jokes, memes, and advertising 2 .

In this paper, we review literature from the period 2016-2018 regarding the application of NLP methods to social media data as a means of addressing public health research questions, focussing specifically on new application areas and the adoption of new methods. A distinctive feature of this review is an emphasis on the increasing volume of research focussed on ethics-related issues involved in using consumer-generated data for public health research.

2 Methods

Our paper selection process involved the following steps. First, we searched PubMed, the Association for Computational Linguistics Anthology, the Proceedings of the Conference on Human Factors in Computer Systems (CHI), and the Proceedings of the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM) using a variety of social media and NLP-related keywords. Second, we manually inspected Tables of Contents for the Journal of the American Medical Informatics Association, the Journal of Biomedical Informatics , and the Journal of Medical Internet Research . In this first pass, over 1,800 papers were identified. After reviewing abstracts, we reduced the number of papers reviewed to 130. In order to increase the tractability of the reviewing task, we further winnowed the papers to 71. This winnowing process was designed to capture a large swathe of both application areas and methods, and cannot be interpreted as a comment on the quality of research.

Only the papers that both demonstrated a clear public health focus and explicitly utilised NLP or text mining methods were retained. Papers that reported on the results of qualitative content analysis or professional standards for health communication using social media without reference to NLP were excluded. Papers that discussed ethical issues pertaining to the use of social media for public health applications and research were retained. References dated outside the period 2016-2018 have been included in order to provide important context. The use of these references does not imply that they form part of the document set defined by the inclusion criteria.

The papers reviewed utilise social media from several different sources, including Twitter, Reddit, Weibo, Facebook, and online discussion forums (see Figure 1 and Tables 1 & 2 ).

Fig. 1.

Fig. 1

Social media data sources. Note that this list is not exhaustive.

Table 1. Number of papers by topic and data source. Note that papers can occur in several categories.

Data Source Vac a Comm b Cancer c SA d Pharmaco e STI f MH g Total
Reddit - 1 - 3 - 1 13 18
Twitter 3 3 1 17 7 1 9 41
Instagram - - - - - - 1 1
Facebook 1 - - - - - 3 4
OHC h 1 - 2 2 1 - 6 12
Weibo - 1 - - - - 1 2
WhatsApp - - - 1 - - - 1
Youtube - - - 1 - - - 1
Yik-Yak - - - 1 - - - 1
Tumblr - - - - - - 1 1
a

Vaccination hesitancy and refusal;

b

Health communication;

c

Cancer;

d

Substance Abuse;

e

Pharmacovigilance;

f

Sexually transmitted infections;

g

Mental health;

h

Online Health Communities

Table 2. Data Sources and Topics [Note that ethics-related papers are excluded from this table as they are frequently concerned with social media in general.].

Data Source Vac a Comm b Cancer SA c Pharmaco d STI e MH f
Reddit - 10 - 11 12 13 - 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Twitter 28 29 30 31 32 33 34 6 , 12 , 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 18 , 58 59 60 61 62 63 64 65
Instagram - - - - - - 18
Facebook 66 - - - - - 8 , 18 , 67
OHC g 5 - 68 , 69 12 , 13 50 - 70 71 72 73 74 75
Weibo - 32 - - - - 76
Tumblr - - - - - - 18
a

Vaccination hesitancy and refusal;

b

Communicable diseases;

c

Substance Abuse;

d

Pharmacovigilance;

e

Sexually transmitted infections;

f

Mental health;

g

Online Health Communities

The vast majority of the papers reviewed focussed on analysing English language text (68 papers), with two papers focussing on Chinese text 76 , 77 and one paper focussing on Japanese text 31 . With respect to the geographical location of first authors, most of the articles emerged from North America (55), with Europe (7), and Asia (including Australasia and Turkey) (6) all represented.

The reviewed papers can be grouped into several health-related categories, including vaccine hesitancy and refusal, communicable diseases surveillance (including sexually transmitted infections, [STIs]), cancer, substance abuse, pharmacovigilance, and mental health (see Table 2 ). A wide range of methods were used, including “classical” machine learning (e.g., Random Forests, Support Vector Machines [SVM]), “modern” machine learning (e.g., Convolutional Neural Networks [CNN], Recurrent Neural Networks [RNN] 2 ), and lexicon-based approaches). Among the lexicon-based approaches, the Linguistic Inquiry and Word Count (LIWC) lexicon, a dictionary of words arranged into numerous psychological dimensions, is used extensively in many of the papers reviewed, especially in the areas of mental health and substance abuse 79 .

3 Results

3.1 Vaccine Hesitancy and Refusal

Vaccine hesitancy – defined by the World Health Organisation as referring to a “delay in acceptance or refusal of vaccines despite availability of vaccination services” 3 – has been a growing subject of research during learning methods 5 , 29 , 30 , and one used modem machine learning methods 30 , with surveillance 28 29 30 , health communication 5 , 28 29 30 , 66 , and sentiment analysis 28 29 30 , 66 , all frequently studied topics. The LIWC lexicon has been used either to characterise public attitudes towards vaccination in general 66 , or as a tool to explore the purported link between autism and the Measles, Mumps, and Rubella vaccine 28 . This last study aimed at investigating key differences between users who are longstanding vaccination advocates, long standing anti-vaccination advocates, or users who had recently adopted an anti-vaccination orientation. Vaccination the review period, with NLP methods applied to social media data in an attempt to develop insights into how best to understand and improve health communication as well as quantifying the degree of vaccine hesitancy in a community.

Of the five papers reviewed in this section (see Table 3 ), three utilised Twitter data 29 , 30 , one utilised Facebook data 66 , and one further paper utilised data derived from an online health community, in this case moth- ering.com 5 . Supervised machine learning 30 and unsupervised machine learning 5 , 28 , 29 were both represented. Three of the papers reviewed used classical machine to protect against the Human Papillomavirus Virus (HPV) – a vaccine typically administered to adolescent boys and girls to prevent future sexual transmission of the disease – was also the subject of reviewed research, with high performance sentiment classifiers developed (AUC: 0.92) 30 , and LDA (Latent Dirichlet Allocation) topic modeling used to identify a number of vaccine-hesitancy-related topics, including clinical evidence and vaccination harms 29 .

Table 3. Summary of vaccine-related papers.

Data Source SML a UML b UML b CML c MML d Surv e HC f Senti g Lexicon h
Twitter 30 28 , 29 28 , 29 29 , 30 30 28 29 30 28 29 30 28 29 30 28
Facebook - - - - - - 66 66 66
OHC i - 5 5 5 - - 5 - -
a

Supervised machine learning (e.g., Support Vector Machines, Random Forests);

b

Unsupervised machine learning (e.g., Latent Dirichlet Allocation, K-means);

c

Classical machine learning (e.g., Random Forests, Support Vector Machines);

d

Modern machine learning (e.g., Convolutional Neural Networks);

e

Surveillance;

f

Health communication;

g

Sentiment analysis;

h

Lexicon-based methods;

i

Online health communities

In a further example of novel research, Tangherlini et al., produced a statistical-mechanical network model representing relationships between “actants” (actors) that is used to automatically extract typical narratives and “story fragments” related to vaccination issues, evidencing a narrative framework related to a pronounced distrust of government and medical authority 5 .

3.2 Communicable Diseases and Sexually Transmitted Infections

Systems designed to use social media data for pandemic public health surveillance have existed for almost 13 years 80 , 81 , and approaches that are variously referred to as infodemiology 82 , digital disease detection 83 , and digital epidemiology 84 are by now well established, particularly for dengue, influenza, and more recently, ebola. In addition, significant research efforts have centered on the study of STI, despite some methodological concerns regarding the willingness of users with STIs to disclose their status on social media.

In order to investigate the changing prevalence of a number of health related topics, Park et al., 10 observed that ebola discussions were characterised by concerns about risks and symptoms, while influenza was associated with terms like “CDC” and “H1N1”. Another study focussed on influenza misdiagnoses 33 , achieving an F-score of 0.76. Regarding STIs, one study demonstrated statistically significant associations between Twitter data from 2012 and official Centers for Disease Control syphilis prevalence data from 2013 57 , with a related study discovering that the most frequent STIs discussed were intermediate (non-reportable) STIs like genital herpes and HPV, with more serious (reportable) diseases like syphilis and gonorrhoea discussed less frequently 14 .

Of the six papers reviewed (see Table 4 ), four used Twitter data 31 32 33 , 57 , and two used Reddit data 10 , 14 , while Al-Garadi et al., provided a review that concentrated on Twitter and Weibo, the Chinese language microblog service 32 . Two of the papers reviewed described the use of supervised machine learning methods 31 , 32 , three papers used unsupervised machine learning methods 10 , 14 , 32 , and one used a lexicon-based approach 57 . Machine learning methods were used to perform a variety of tasks, including surveillance 10 , 14 , 31 32 33 , 57 , health communication 32 , and sentiment analysis 32 . Several studies concentrated on influenza surveillance using English 10 , 33 and Japanese 31 Twitter data.

Table 4. Summary of communicable diseases and STI-related papers.

Data Source SML a UML b CML c MML d Surv e HC f Senti g Lexicon h
Reddit - 10 , 14 10 , 14 - 10 , 14 - - -
Twitter 31 , 32 32 [31-33] - [31-33, 57] 32 32 57
Weibo 32 32 32 - 32 32 32 -
a

Supervised machine learning;

b

Unsupervised machine learning;

c

Classical machine learning;

d

Modern machine learning;

e

Surveillance;

f

Health communication;

g

Sentiment analysis;

h

Lexicon-based methods

3.3 Cancer

Work on using NLP and text-mining methods to understand issues directly related to cancer (diagnosis, treatment, and management) are less well developed than some of the other areas considered in this review (e.g., mental health and substance abuse). Of the three cancer-related papers reviewed (see Table 5 ), one utilised Twitter data 34 , and two utilised data derived from an online health community 68 , 69 . All the papers discussed used both classical and modern machine learning methods, with modern machine learning methods performing better than classical machine learning methods, albeit by a narrow margin in the case of Zhang et al.’s work on identifying chemotherapy-related Twitter accounts by account type 34 . Zhang et al., observed that Twitter accounts belonging to individuals focussed on “personal chemotherapy experience and emotions”, whereas professional accounts typically provided a neutral presentation of chemotherapy side effects 34 . Two of the papers were centred on health communication, broadly conceived 68 , 69 , with one paper focusing on sentiment analysis 34 . Concentrating specifically on the patient experience of breast cancer, one study 68 aimed at characterizing how forum topics changed over time depending on the individual’s time since diagnosis and cancer state, and found that diagnosis is the most frequent class in the early stages of cancer treatment, with diagnosis (and treatment) related discussions declining over the course of a user’s cancer journey.

Table 5. Summary of cancer-related papers.

Data Source SML a UML b CML c MML d Surv e HC f Senti g Lexicon h
Twitter 34 34 34 34 - - 34 -
OHC i [68, 69] 68 [68, 69] [68, 69] - [68, 69] - -
a

Supervised machine learning;

b

Unsupervised machine learning;

c

Classical machine learning;

d

Modern machine learning;

e

Surveillance;

f

Health communication;

g

Sentiment analysis;

h

Lexicon;

i

Online Health Communities

3.4 Substance Abuse

This section is concerned with reviewing work centred on the use of social media, in conjunction with NLP methods, to address substance abuse research questions, focussing on opioid abuse, tobacco, e-cigarette and marijuana use , and alcohol abuse . Interesting work on drug abuse – particularly new and emerging products – is increasingly evident in the literature. NLP methods are needed to deal with ambiguity and colloquial expressions used on social media (such as “bath salts”, “kitty cat”, or “miaow miaow” for mephedrone 44 ).

Of the twenty-two papers discussed in this section, three are focussed on opioid abuse [35, 41, 42], eight on tobacco and marijuana use [6, 12, 13, 40, 43, 45, 46, 49], one on alcohol abuse 36 , and one on the street drug, mephedrone 44 . Twitter is the most popular source of data (18 papers) [6, 11, 12, 35-49], with Reddit [11-13], and online health communities 12 , 13 , both represented. Supervised machine learning (8 papers - all utilising Twitter data) and unsupervised machine learning (11 papers) were both evident in the reviewed papers, with classical machine learning approaches more common than modern neural-network-based approaches (17 and 2 papers, respectively). Two of the papers reviewed utilized a rule- based approach. Table 6 summarises the reviewed substance abuse-related papers.

Table 6. Summary of substance abuse-related papers.

Data source SML a UML b CML c MML d Surv e HC f Senti g Lexicon h
Reddit - [11-13] [11-13] - 12 - - 13
Twitter [6, 36, 40, 45-49] [6, 1 2, 35,37, 39, 41, 42, 43, 45] [6, 12, 35, 36, 38-43, 45-49] [6, 37] [1 1, 12, 35, 36, 38, 39, 42, 44, 47-49] 43 [46-48] 44
OHC i - [12, 13] [12, 13] - 12 - - 13
a

Supervised machine learning;

b

Unsupervised machine learning;

c

Classical machine learning;

d

Modern machine learning;

e

Surveillance;

f

Health communication;

g

Sentiment analysis;

h

Lexicon;

i

Online Health Communities

3.4.1 Opioid Abuse

Opioid abuse is now recognised as one of the leading public health problems in the United States 4 , and an important – albeit slightly less pressing – concern in many developed and developing countries. The crisis in the US is due to historical changes in drug prescription policies and practices that have encouraged both the licit and illicit use of highly addictive opioid-based painkillers 5 Every year in the United States, over 72,000 people die as a direct consequence of using opioids 6 , making the need to understand emerging opioid-related behaviours and user trajectories especially pressing. One study concentrated on identifying public reactions to the opioid epidemic by identifying the most popular opioid-related topics tweeted by users 41 . Topics identified included discussions related to the possibility of promoting marijuana as a substitute for opioids, discussions related to the growing opioid market in North America, and discussions related to news reports advocating the use of buprenorphine – a narcotic used to treat opioid addiction – for adolescents experiencing opioid use disorders. Another study 35 aimed at detecting marketing and sale of opioids by illicit online sellers. The authors observed that the frequency of tweets directly related to illegal activity was relatively low when compared with other kinds of opioid mentions. A similar observation was made for tweets promoting the illegal online sale of fentanyl 42 . In this context, unsupervised approaches are of significant value for understanding changes in a rapidly developing online environment.

3.4.2 Tobacco, E-Cigarette, and Marijuana Use and Abuse

Tobacco use is declining in popularity in much of the developed world (the proportion of smokers in the US has declined by over half since 1964 and now stands at 16.8% among adults, and approximately half that among high school students 85 ). However, despite this decrease in tobacco use, there has been a dramatic increase–now plateauing – in the use of e-cigarettes since their introduction to developed world markets in around 2007 86 . This increase has occurred in the context of a lack of consensus regarding both the safety of the product 87 and its potential efficacy as a smoking cessation device 88 . In addition to these shifts in tobacco use, there have also been substantial changes in the regulation of marijuana products, particularly in the US context, and these changes have led – it has been suggested 89 – to an increase in marijuana use 90 . Given these public health concerns, using NLP to investigate tobacco, e-cigarette, and marijuana use, has become an active research area, especially to classify discussions [6, 12, 43, 45, 46] or to determine whether a particular user is above or below 21 years of age 40 . Reported findings included evidence that Twitter users frequently discussed ways in which e-cigarettes can be used in the workplace in a bid to circumvent smoking bans 43 , and evidence that hookah was discussed more frequently at the weekend, indicating its use is associated with leisure activities, while reported tobacco use tends to be more consistent across the week 40 . In addition, authors observed that different social media services manifested distinctly different cultures regarding e-cigarette use, e.g., sensory experiences vs. psychological factors associated with quitting 13 . Rule- based approaches were used to identify where people reported using e-cigarettes, with 39% of posts referring to e-cigarette use in the classroom 49 . Other studies aimed at describing strategies for marketing Little Cigars & Cigarillos (LCC) and observed that 83% of identified LCC tweets referred to marijuana, and 29% of LCC tweets referenced memes 45 .

3.4.3 Alcohol Abuse

Alcohol abuse was the seventh leading risk-factor worldwide for both death and disability in 2016. In the same year, among males aged 15-49, alcohol was a causal factor in 12% of deaths 91 . One of the reviewed studies 36 yielded the surprising result that– in the US at least – a positive correlation exists between excessive county-level alcohol consumption and higher education, suggesting that highly educated counties drink more, or at least tweet more about their drinking.

3.5 Pharmacovigilance

Pharmacovigilance – i.e. the post-market surveillance of drugs – was an early health-related focus for social media NLP 92 , 93 and has remained an important subject of research, with applications including the identification of mentions of Adverse Drug Reactions (ADRs) 51 , 55 . One recent study focussed on topics related to Thyroid Hormone Replacement Therapy (THRT), particularly on the identification of side effects 50 . It was discovered that male and female users of THRT had different experiences and concerns regarding side effects, with women primarily concerned about the effect of the drug on personal appearance and men more concerned about potential pain symptoms associated with the drug.

A recent significant development in pharmacovigilance research was the instigation of the SMM4 2017 shared task. The shared task consisted of three subtasks: automatic identification of ADRs, automatic classification of tweets that explicitly mentioned medication consumption, and normalization of ADR mentions. Important outputs of this effort included a publicly available corpus 51 and language models 55 for future research. In addition to this work on ADR identification and normalization, the identification of semantic relationships – chiefly causal relationships – between drug and symptom mentions had been a focus of research 52 , 53 . A key challenge associated with this task is the difficulty involved in distinguishing between drug use as a response to a particular symptom (“I have a horrible headache and just took some ibuprofen”) and the existence of a symptom as a side effect of a drug (“Ever since I started taking Sertraline I’ve felt like crap”). Despite the difficulty of this task, Bollegala et al., achieved a moderately high F-score (0.74) using a skip-gram based method 52 .

Six of the pharmacovigilance papers reviewed used Twitter as a data source 51 , 56 , while one used an online health community (see Table 7 ). Four of the papers used supervised methods 5154 and five used unsupervised methods 50 , 5356 with five using classical machine learning methods 5053 , 56 and three using modern machine learning methods 51 , 54 , 55 , with (unsurprisingly given the topic of pharmacovigilance) surveillance being the main application area.

Table 7. Summary of pharmacovigilance-related papers.

Data Source SML a UML b CML c MML d Surv e HC f Senti g Lexicon h
Twitter [51-54] [53-56] [51-53, 56] [51, 54, 55] [51-54, 56] - - -
OHC i - 50 50 - - - - -
a

Supervised machine learning;

b

Unsupervised machine learning;

c

Classical machine learning;

d

Modern machine learning;

e

Surveillance;

f

Health communication;

g

Sentiment analysis;

h

Lexicon-based methods;

i

Online Health Communities

3.6 Mental Health

Mental health problems are estimated to account for 13% of the global burden of disease, as measured in Disability Adjusted Life Years 95 . Using social media as a resource to understand mental health is a research area that has experienced substantial growth in recent years 96 , given the burden of disease associated with mental health problems and the fact that social media provides ready access to first person reports of behaviour, thoughts, and feelings. Reviewed studies covered a range of mental health topics, including predicting depression diagnosis 8 , assessing suicide risk [16, 18, 24, 74-76, 98, 99], and developing a better understanding of users’ experiences of eating disorders 15 , schizophrenia 59 , 61 , grief processes between gang-involved youth 58 , relaxation 62 , stress 63 , pathological empathy 67 , 72 , and negative emotional effects associated with campus-based mass murders 64 . Related to this, a range of metrics have been used to characterize language use associated with specific mental health conditions, with lexical diversity, readability scores, sentence complexity, negation, uncertainty , and degree of repetition , all used during the review period [23, 26, 27, 60]. In novel work focussing on the relationship between clinical guidelines and actual treatments, Zhang et al. 71 created a catalogue of real-world treatments used – as opposed to merely discussed – by parents of children with autistic spectrum disorder, and then automatically identified their frequency of mention in two online autism forums.

With a view to improving how mental health forums are designed, one study applied textual cluster analysis to forums related to the conditions anxiety, depression, and post-traumatic stress disorder (PTSD) 19 , showing that–consistent with current thinking regarding the relationship between PTSD and anxiety 97 – anxiety and PTSD forums shared more similarities to each other than to the depression forum. Related to this, another study found that different communities provided different degrees of emotional and informational support 20 , with some communities (e.g., depression forums) focussed primarily on emotional support, and other communities (e.g. obsessive compulsive disorder forums) offering a greater proportion of informational support. Furthermore, the same study found that at the user level, the provision of social support was correlated with demonstrated linguistic accommodation, suggesting that those users who were able to “match” the linguistic culture of a particular community were likely to receive a greater volume of social support. Finally, a further study 100 involved the development of a classifier capable of identifying respectful uses of a mental-health related term (e.g. “I’m fuming. How dare a TV show portray folks suffering from mental health issues so unfairly”) and less-respectful usage.

Of the thirty-one mental health-related papers reviewed (see Table 8 ), thirteen involved the use of Reddit data [15-27], ten used Twitter data [18, 24, 58-65], one used Instagram 18 , three used Facebook [8, 18, 67], six used OHC data [70-75], and one used data derived from Weibo 76 , with twenty-two of the papers utilising supervised machine learning methods [8, 16, 18, 20-22, 24, 25, 58-62, 65, 67, 70-76], and twelve papers utilising unsupervised machine learning [8, 15, 18-22, 27, 59, 60, 70, 72]. The majority of the papers reported on the use of classical machine learning approaches [8, 15, 16, 18-20, 22, 24, 25, 27, 58-62, 65, 67, 71, 73-76], with a minority using modern machine learning methods [18, 21, 22, 67, 70, 72]. Four of the mental health papers reviewed utilised primarily lexicon-based methods [17, 23, 63, 64].

Table 8. Summary of mental health-related papers.

Datasource SML a UML b CML c MML d Surv e HC f Senti g Lexicon h
Reddit [16, 18, 20-22, 24, 25] [15, 18-22, 27] [15, 16, 18-20, 22, 24, 25, 27] [21, 22] - - 26 17 , 23
Twitter [18, 58-62, 65] [18, 59, 60] [58-62, 65] 18 - - [63, 64, 24] [63, 64]
Instagram 18 18 - 18 - - - -
Facebook [8, 18, 67] [8, 18] [8, 67] [18, 67] - - - -
OHC i [70-75] [70, 72] [71, 73-75] [70, 72] - - - -
Weibo 76 - 76 - - - - -
a

Supervised machine learning;

b

Unsupervised machine learning;

c

Classical machine learning;

d

Modern machine learning;

e

Surveillance;

f

Health communication;

g

Sentiment analysis;

h

Lexicon-based methods;

i

Online Health Communities

3.7 Ethical Issues

Two types of ethics-related papers are discussed in this section: those that are focussed on empirical ethics (i.e. the empirical investigation of ethical beliefs and practices) 101 , 102 , and those that are focussed on ethical guideline development (i.e. the generation of theoretical frameworks and practical guidelines for conducting health-related NLP research with social media) [9, 103, 104]. Reviewed studies highlighted the need for both transparency in the development of algorithms and an ethical framework to guide the appropriate use of social media for computational public health research.

Focussing specifically on research ethics from the perspective of social media users, one study 102 pointed to a generally favourable view of the use of computational methods for public health research among social media users, provided that data was highly aggregated, and the goal of the work was of significant public health value (e.g. opioid abuse surveillance was acceptable in a public health context, but not when used for employment screening). However, among some users, concerns remained regarding the robustness of both the data and the research methods, due to the fact that the data was not representative of the general population, and was subject to impression management (i.e. many users did not tweet about stigmatising health problems 105 ). Related to this work, one paper – a systematic review of attitudes towards the ethics of computational social media research 106 – found a range of different views on appropriate research ethics, depending on the particular research topic discussed, suggesting that a “blanket” approach to research ethics is currently not appropriate, and instead ethical deliberations ought to take into account the particular context of the research under review 106 .

As noted by Vayena et al., 104 , the research regulation infrastructure in most jurisdictions was developed in the period prior to social media, and hence is not well-equipped to manage the review of computational social media research. This point is reinforced by a qualitative study conducted with Research Ethics Committee (Institutional Review Board) members in the United Kingdom. This study outlines the challenges faced by ethics committees in the application of existing research ethics regulation to computational work and emphasises the need to protect research participants (i.e. social media users), even in the context of research using publicly available data 101 .

Finally, practical guidelines have recently been developed to guide NLP research using social media data 103 , with eight principles outlined, including the stipulation that as most social media based NLP research can be defined as human subjects research 107 , ethical approval or exemption ought to be gained from an Institutional Review Board or Research Ethics Committee; that data ought to be de-identified for use in publications and presentations; and that caution ought to be exercised in linking data.

In recent years there has been a move away from the commonly held view that in social media research “anything goes”, towards a more sophisticated perspective that acknowledges both the existence and importance of the ethical and regulatory issues involved in the application of NLP to social media for health research. Further, the provision of ethical guidelines developed specifically for NLP researchers – as described above, 103 – is a new and welcome development in the period since 2016.

4 Discussion and Conclusion

In this survey, we have presented recent advances in the application of NLP to social media to address public health research questions. We observed a substantial growth in the area of mental health and substance abuse research, and a continuing sustained interest in the use of social media for studying communicable diseases (particularly in the area of vaccine hesitancy). The widespread use of lexical resources developed in the psychology research communities – specifically, LIWC – is also notable, as is the relatively low frequency of “modern” (as opposed to “classical”) machine learning approaches.

While predicting future trends is not a straightforward task, we tentatively suggest four directions in which current work is evolving. First, linking data – with appropriate consent – from the EHR and social media, both in the context of public health research and clinical care. Examples of this type of work in the research context already exist (e.g. 8 ), and will likely be a focus of considerable research effort over the next few years.

Second, further utilisation of social media in public health surveillance. Currently, while advances have been made in research using NLP and social media, substantial barriers still exist to implementing social media health surveillance in the context of public health practice. These barriers include costs (public health agencies are frequently underfunded), limited expertise in NLP, and difficulties in integrating social media analysis with existing surveillance methods and pipelines. However, even given these challenges, considerable strides have been made, particularly in the area of pharmacovigilance (e.g. the Food & Drug Administration Center for Drug Evaluation and Research).

Third, much social media research relies on the identification of appropriate keywords to construct a data sample suitable for the research question at hand. This keyword selection process has typically relied on intuition. However, recently there has been a move towards a more data-driven means of iteratively identifying and evaluating keywords (and their associated synonyms), with word embeddings and other empirical synonym discovery methods (e.g. 108 ). This shift towards a more principled method of selecting keywords for data sampling is to be welcomed.

Fourth, while we believe that Twitter will remain a valuable (and popular) data source for NLP research, we suspect that Reddit will become increasingly popular as a research resource, partly due to its “research-friendly” terms and conditions and its increasing user base. Related to this, the dynamism of the social media ecosystem should not be underestimated, with new services (e.g. TikTok) attracting users – especially new adolescent users – away from existing services. Given this rapidly changing social media environment, there is little reason to believe that currently popular social media platforms will maintain their current level of popularity.

Acknowledgements

This work was partially supported by the National Institute on Drug Abuse of the United States National Institutes of Health under award number R21DA043775. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

1

For example, the Social Media Mining for Health Applications (SMM4H) Workshop or the Computational Linguistics and Clinical Psychology (CLPsych) Workshop

2

Note that the terms “classical” and “modern” machine learning are, from a historical perspective, misnomers, given the roots of neural network theory in the mid-twentieth century 78 .

References

  • 1.Fox S.The social life of health information; Available from:http://www.pewresearch.org/fact-tank/2014/01/15/the-social-life-of-health-information/
  • 2.Paul M, Dredze M, Marchionini G.Social Monitoring for Public Healtheditor.Morgan Claypool 2017 [Google Scholar]
  • 3.Guntuku S, Yaden D, Kern M, Ungar L, Eichstaedt J. Detecting depression and mental illness on social media: an integrative review. Curr Opin Behav Sci. 2017;18:43–9. [Google Scholar]
  • 4.Trautmann S, Rehm J, Wittchen H U. The economic costs of mental disorders: Do our societies react appropriately to the burden of mental disorders? EMBO Rep. 2016;17(09):1245–9. doi: 10.15252/embr.201642951. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Tangherlini T R, Roychowdhury V, Glenn B, Crespi C M, Bandari R, Wadia A et al. “Mommy Blogs” and the vaccination exemption narrative: results from a machine-learning approach for story aggregation on parenting social media sites. JMIR Public Health Surveill. 2016;2(02):e166. doi: 10.2196/publichealth.6586. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Allem J P, Dharmapuri L, Unger J B, Cruz T B. Characterizing JUUL-related posts on Twitter. Drug Alcohol Depend. 2018;190:1–5. doi: 10.1016/j.drugalcdep.2018.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Charles-Smith L, Reynolds T, Cameron M, Conway M, Lau E, Olsen J et al. Using social media for actionable disease surveillance and outbreak management: a systematic literature review. PLoS One. 2015;10(10):e0139701. doi: 10.1371/journal.pone.0139701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Eichstaedt J C, Smith R J, Merchant R M, Ungar L H, Crutchley P, Preoiuc-Pietro D et al. Facebook language predicts depression in medical records. Proc Natl Acad Sci U S A. 2018;115(44):11203–8. doi: 10.1073/pnas.1802331115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Vayena E, Blasimme A, Cohen I G. Machine learning in medicine: Addressing ethical challenges. PLoS Med. 2018;15(11):e1002689. doi: 10.1371/journal.pmed.1002689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Park A, Conway M. Tracking health related discussions on Reddit for public health applications. AMIA Annu Symp Proc. 2017;2017:1362–71. [PMC free article] [PubMed] [Google Scholar]
  • 11.Meacham M, Paul M, Ramo D. Understanding emerging forms of cannabis use through an online cannabis community: An analysis ofrelative post volume and subjective highness ratings. Drug Alcohol Depend. 2018;188:364–9. doi: 10.1016/j.drugalcdep.2018.03.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhan Y, Liu R, Li Q, Leischow S, Zeng D. Identifying topics for e-cigarette user-generated contents: a case study from multiple social media platforms. J Med Internet Res. 2017;19(01):e24. doi: 10.2196/jmir.5780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Chen A, Zhu S H, Conway M. What online communities can tell us about electronic cigarettes and hookah use: a study using text mining and visualization techniques. J Med Internet Res. 2015;17(09):e220. doi: 10.2196/jmir.4517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Nobles A, Dreisbach C, Kelm-Malpass J, Barnes L.“Is This an STD? Please Help!”: Online Information Seeking for Sexually Transmitted Diseases on Reddit. In: Proceedings of the Twelfth International Conference on Web and Social Media; 2018. p. 660-3 [PMC free article] [PubMed]
  • 15.Moessner M, Feldhege J, Wolf M, Bauer S. Analyzing big data in social media: text and network analyses of an eating disorder forum. Int J Eat Disord. 2018;51(07):656–67. doi: 10.1002/eat.22878. [DOI] [PubMed] [Google Scholar]
  • 16.Aladag A, Muderrisoglu S, Akbas N, Zahmacioglu O, Bingol H. Detecting suicidal Ideation on forums: proof-of-concept study. J Med Internet Res. 2018;20(06):e215. doi: 10.2196/jmir.9840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Park A, Conway M. Harnessing Reddit to understand the written-communication challenges experienced by individuals with mental health disorders: analysis of texts from mental health communities. J Med Internet Res. 2018;20(04):e121. doi: 10.2196/jmir.8219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Coppersmith G, Leary R, Crutchley P, Fine A. Natural language processing of social media as screening for suicide risk. Biomed Inform Insights. 2018;10:1.17822261879286E15. doi: 10.1177/1178222618792860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Park A, Conway M, Chen A. Examining thematic similarity, difference, and membership in three online mental health communities from Reddit: a text mining and visualization approach. Comput Human Behav. 2018;78:98–112. doi: 10.1016/j.chb.2017.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Sharma E, De Choudhury M.Mental health support and its relationship to linguistic accommodation in online communities. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. CHI ’18. New York, NY, USA: ACM; 2018. p. 641:1-641:13. Available from:http://doi.acm.org/10.1145/3173574.3174215
  • 21.Ive J, Gkotsis G, Dutta R, Stewart R, Velupillai S.Hierarchical neural model with attention mechanisms for the classification of social media text related to mental health. In: Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic. Association for Computational Linguistics; 2018. p. 69-77. Available from:http://aclweb.org/anthology/W18-0607
  • 22.Gkotsis G, Oellrich A, Velupillai S, Liakata M, Hubbard T, Dobson R et al. Characterisation of mental health conditions in social media using Informed Deep Learning. Sci Rep. 2017;7:45141. doi: 10.1038/srep45141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Park A, Conway M. Longitudinal changes in psychological states in online health community members: understanding the long-term effects of participating in an online depression community. J Med Internet Res. 2017;19(03):e71. doi: 10.2196/jmir.6826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kavuluru R, Williams A G, Ramos-Morales M, Haye L, Holaday T, Cerel J. Classification of helpful comments on online suicide watch forums. ACM BCB. 2016;2016:32–40. doi: 10.1145/2975167.2975170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.De Choudhury M, Kiciman E, Dredze M, Coppersmith G, Kumar M. Discovering shifts to suicidal ideation from mental health content in social media. Proc SIGCHI Conf Hum Factor Comput Syst. 2016;2016:2098–110. doi: 10.1145/2858036.2858207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gkotsis G, Oellrich A, Hubbard T, Dobson R, Liakata M, Velupillai Set al. The language of mental health problems in social media. In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology. Association for Computational Linguistics; 2016. p. 63-73. Available from:http://aclweb.org/anthology/W16-0307
  • 27.Kumar M, Dredze M, Coppersmith G, De Choudhury M. Detecting changes in suicide content manifested in social media following celebrity suicides. HT ACM Conf Hypertext Soc Media. 2015;2015:85–94. doi: 10.1145/2700171.2791026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mitra T, Counts S, Pennebaker J.Understanding anti-vaccination attitudes in social media. In: Proceedings of the Tenth International Conference on Web and Social Media (ICWSM 2016): 2016. p. 269-78
  • 29.Surian D, Nguyen D Q, Kennedy G, Johnson M, Coiera E, Dunn A G. Characterizing Twitter discussions about HPV vaccines using topic modeling and community detection. J Med Internet Res. 2016;18(08):e232. doi: 10.2196/jmir.6045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Massey P, Leader A, Yom-Tov E, Budenz A, Fisher K, Klassen A. Applying multiple data collection tools to quantify human papillomavirus vaccine communication on Twitter. J Med Internet Res. 2016;18(12):e318. doi: 10.2196/jmir.6670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Wakamiya S, Kawai Y, Aramaki E. Twitter-based influenza detection after flu peak via tweets with indirect information: text mining study. JMIR Public Health Surveill. 2018;4(03):e65. doi: 10.2196/publichealth.8627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Al-Garadi M A, Khan M S, Varathan K D, Mujtaba G, Al-Kabsi A M. Using online social networks to track a pandemic: A systematic review. J Biomed Inform. 2016;62:1–11. doi: 10.1016/j.jbi.2016.05.005. [DOI] [PubMed] [Google Scholar]
  • 33.Mowery J. Twitter Influenza Surveillance: Quantifying Seasonal Misdiagnosis Patterns and their Impact on Surveillance Estimates. Online J Public Health Inform. 2016;8(03):e198. doi: 10.5210/ojphi.v8i3.7011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zhang L, Hall M, Bastola D. Utilizing Twitter data for analysis of chemotherapy. Int J Med Inform. 2018;120:92–100. doi: 10.1016/j.ijmedinf.2018.10.002. [DOI] [PubMed] [Google Scholar]
  • 35.Mackey T, Kalyanam J, Klugman J, Kuzmenko E, Gupta R. Solution to detect, classify, and report illicit online marketing and sales of controlled substances via Twitter: using machine learning and web forensics to combat digital opioid access. J Med Internet Res. 2018;20(04):e10029. doi: 10.2196/10029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Curtis B, Giorgi S, Buffone A EK, Ungar L H, Ashford R D, Hemmons J et al. Can Twitter be used to predict county excessive alcohol consumption rates? PLoS One. 2018;13(04):e0194290. doi: 10.1371/journal.pone.0194290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Simpson S S, Adams N, Brugman C M, Conners T J. Detecting novel and emerging drug terms using natural language processing: a social media corpus study. JMIR Public Health Surveill. 2018;4(01):e2. doi: 10.2196/publichealth.7726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ayers J, Dredze M, Leas E, Caputi T, Allem J P, Cohen J. Next generation media monitoring: Global coverage of electronic nicotine delivery systems (electronic cigarettes) on Bing, Google and Twitter, 2013–2018. PLoS One. 2018;13(11):e0205822. doi: 10.1371/journal.pone.0205822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mackey T K, Kalyanam J, Katsuki T, Lanckriet G. Twitter-based detection of illegal online sale of prescription opioid. Am J Public Health. 2017;107(12):1910–1915. doi: 10.2105/AJPH.2017.303994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Huang T, Elghafari A, Relia K, Chunara R.High-resolution temporal representations of alcohol and tobacco behaviors from social media data. Proc ACM Hum Comput Interact 2017 Nov;1(CSCW) [DOI] [PMC free article] [PubMed]
  • 41.Glowacki E M, Glowacki J B, Wilcox G B.A text-mining analysis of the public’s reactions to the opioid crisis. Subst Abus 2017 Jul; p. 1-5 [DOI] [PubMed]
  • 42.Mackey T K, Kalyanam J. Detection of illicit online sales of fentanyls via Twitter. F1000Res. 2017;6:1937. doi: 10.12688/f1000research.12914.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Lazard A, Saffer A, Wilcox G, Chung A D, Mackert M, Bernhardt J. E-cigarette social media messages: a text mining analysis of marketing and consumer conversations on Twitter. JMIR Public Health Surveill. 2016;2(02):e171. doi: 10.2196/publichealth.6551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kolliakou A, Ball M, Derczynski L, Chandran D, Gkotsis G, Deluca P et al. Novel psychoactive substances: An investigation of temporal trends in social media and electronic health records. Eur Psychiatry. 2016;38:15–21. doi: 10.1016/j.eurpsy.2016.05.006. [DOI] [PubMed] [Google Scholar]
  • 45.Kostygina G, Tran H, Shi Y, Kim Y, Emery S.“Sweeter Than a Swisher”: amount and themes of little cigar and cigarillo content on Twitter Tob Control 201625(Suppl 1):i75–i82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Daniulaityte R, Chen L, Lamy F R, Carlson R G, Thirunarayan K, Sheth A. “When ’Bad’ is ’Good’”: Identifying Personal Communication and Sentiment in Drug-Related Tweets. JMIR Public Health Surveill. 2016;2(02):e162. doi: 10.2196/publichealth.6327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kavuluru R, Sabbir A KM. Toward automated e-cigarette surveillance: Spotting e-cigarette proponents on Twitter. J Biomed Inform. 2016;61:19–26. doi: 10.1016/j.jbi.2016.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Alvaro N, Conway M, Doan S, Lofi C, Overington J, Collier N. Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use. J Biomed Inform. 2015;58:280–7. doi: 10.1016/j.jbi.2015.11.004. [DOI] [PubMed] [Google Scholar]
  • 49.Kim A, Hopper T, Simpson S, Nonnemaker J, Lieberman A, Hansen H et al. Using Twitter data to gain insights into e-cigarette marketing and locations of use: An infoveillance study. J Med Internet Res. 2015;17(11):e251. doi: 10.2196/jmir.4466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Park S, Hong S. Identification of primary medication concerns regarding thyroid hormone replacement therapy From online patient medication reviews: text mining of social network data. J Med Internet Res. 2018;20(10):e11085. doi: 10.2196/11085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Sarker A, Belousov M, Friedrichs J, Hakala K, Kiritchenko S, Mehryary F et al. Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task. J Am Med Inform Assoc. 2018;25(10):1274–83. doi: 10.1093/jamia/ocy114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bollegala D, Maskell S, Sloane R, Hajne J, Pirmohamed M. Causality patterns for detecting adverse drug reactions from social media: text mining approach. JMIR Public Health Surveill. 2018;4(02):e51. doi: 10.2196/publichealth.8214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kagashe I, Yan Z, Suheryani I. Enhancing seasonal influenza surveillance: topic analysis of widely used medicinal drugs using Twitter data. J Med Internet Res. 2017;19(09):e315. doi: 10.2196/jmir.7393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Cocos A, Fiks A, Masino A. Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J Am Med Inform Assoc. 2017;24(04):813–21. doi: 10.1093/jamia/ocw180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Sarker A, Gonzalez G. A corpus for mining drug-related knowledge from Twitter chatter: Language models and their utilities. Data Brief. 2017;10:122–131. doi: 10.1016/j.dib.2016.11.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Sarker A, O’Connor K, Ginn R, Scotch M, Smith K, Malone D et al. Social media mining for toxicovigilance: automatic monitoring of prescription medication abuse from Twitter. Drug Saf. 2016;39(03):231–40. doi: 10.1007/s40264-015-0379-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Young S D, Mercer N, Weiss R E, Torrone E A, Aral S O. Using social media as a tool to predict syphilis. Prev Med. 2018;109:58–61. doi: 10.1016/j.ypmed.2017.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Patton D U, MacBeth J, Schoenebeck S, Shear K, McKeown K. Accommodating grief on Twitter: an analysis of expressions of grief among gang involved youth on Twitter using qualitative analysis and natural language processing. Biomed Inform Insights. 2018;10:1.178222618763155E15. doi: 10.1177/1178222618763155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Ernala S, Labetoulle T, Bane F, Bimbaum M, Rizvi A, Kane Jet al. Characterizing Audience Engagement and Assessing Its Impact on Social Media Disclosures of Mental Illnesses. In: Proceedings of the Twelfth International Conference on Web and Social Media; 2018. p. 62-71
  • 60.Guntuku S C, Ramsay J R, Merchant R M, Ungar L H.Language of ADHD in adults on social media. J Atten Disord 2017 Nov:1087054717738083 [DOI] [PubMed]
  • 61.Birnbaum M, Ernala S, Rizvi A, De Choudhury M, Kane J. A collaborative approach to identifying social media markers of schizophrenia by employing machine learning and clinical appraisals. J Med Internet Res. 2017;19(08):e289. doi: 10.2196/jmir.7956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Doan S, Ritchart A, Perry N, Chaparro J, Conway M. How do You relax When You’re stressed? a content analysis and infodemiology study of stress-related tweets. JMIR Public Health Surveill. 2017;3(02):e35. doi: 10.2196/publichealth.5939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Loveys K, Crutchley P, Wyatt E, Coppersmith G.Small but mighty: affective micropatterns for quantifying mental health from social media language. In: Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology – From Linguistic Signal to Clinical Reality. Association for Computational Linguistics; 2017. p. 85-95. Available from:http://aclweb.org/anthology/W17-3110
  • 64.Jones N, Wojcik S, Sweeting J, Silver R C. Tweeting negative emotion: an investigation of Twitter data in the aftermath of violence on college campuses. Psychol Methods. 2016;21(04):526–41. doi: 10.1037/met0000099. [DOI] [PubMed] [Google Scholar]
  • 65.Mowery D, Park A, Bryan C, Conway M.Towards Automatically Classifying Depressive Symptoms from Twitter Data for Population Health. In: Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES). The COLING 2016 Organizing Committee; 2016. p. 182-191. Available from:http://aclweb.org/anthology/W16-4320
  • 66.Faasse K, Chatman C, Martin L. A comparison of language use in pro- and anti- vaccination comments in response to a high profile Facebook post. Vaccine. 2016;34(47):5808–5814. doi: 10.1016/j.vaccine.2016.09.029. [DOI] [PubMed] [Google Scholar]
  • 67.Abdul-Mageed M, Buffone A, Peng H, Eichstaedt J, Ungar L.Recognizing pathogenic empathy in social media. In: Proceedings of the Eleventh International Conference on Web and Social Media; 2017. p. 448-51
  • 68.Zhang S, Grave E, Sklar E, Elhadad N. Longitudinal analysis of discussion topics in an online breast cancer community using convolutional neural networks. J Biomed Inform. 2017;69:1–9. doi: 10.1016/j.jbi.2017.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Zhang S, Qiu L, Chen F, Zhang W, Yu Y, Elhadad N.“We make choices we think are going to save us”: Debate and stance identification for online breast cancer CAM discussions. Proc Int World Wide Web Conf 2017 Apr; 2017, 1073-81 [DOI] [PMC free article] [PubMed]
  • 70.Khanpour H, Caragea C.Fine-grained emotion detection in health-related online posts. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics; 2018. p. 1160-6. Available from:http://aclweb.org/anthology/D18-1147
  • 71.Zhang S, Kang T, Qiu L, Zhang W, Yu Y, Elhadad N. Cataloguing treatments discussed and used in online autism communities. Proc Int World Wide Web Conf. 2017;2017:123–31. doi: 10.1145/3038912.3052661. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Khanpour H, Caragea C, Biyani P.Identifying empathetic messages in online health communities. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Asian Federation of Natural Language Processing; 2016. p. 246-51. Available from:http://aclweb.org/anthology/I17-2042
  • 73.Franco-Penya H, Mamani Sanchez L.Text-based experiments for predicting mental health emergencies in online web forum posts. In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology. Association for Computational Linguistics; 2016. p. 193-7. Available from:http://aclweb.org/anthology/W16-0327
  • 74.Asgari E, Nasiriany S, Mofrad M.Text Analysis and Automatic Triage of Posts in a Mental Health Forum. In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology. Association for Computational Linguistics; 2016. p. 153-7. Available from:http://aclweb.org/anthology/W16-0318
  • 75.Cohan A, Young S, Goharian N.Triaging mental health forum posts. In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology. Association for Computational Linguistics; 2016. p. 143-7. Available from:http://aclweb.org/anthology/W16-0316
  • 76.Cheng Q, Li T M, Kwok C L, Zhu T, Yip P S. Assessing suicide risk and emotional distress in Chinese social media: a text mining and machine learning study. J Med Internet Res. 2017;19(07):e243. doi: 10.2196/jmir.7276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Guo H, Na X, Hou L, Li J. Classifying Chinese Questions Related to Health Care Posted by Consumers Via the Internet. J Med Internet Res. 2017;19(06):e220. doi: 10.2196/jmir.7156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Boden M.Mind as Machine: A History of Cognitive Science. OUP; 2006
  • 79.Tausczik Y, Pennebaker J. The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol. 2010;29(01):24–54. [Google Scholar]
  • 80.Brownstein C J, Sand F. HealthMap: the development of automated real-time internet surveillance for epidemic intelligence. Euro Surveill. 2007;12(11):E071129.5. doi: 10.2807/esw.12.48.03322-en. [DOI] [PubMed] [Google Scholar]
  • 81.Collier N, Doan S, Kawazoe A, Matsuda-Goodwin R, Conway M, Tateno Y et al. BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics. 2008;24(24):2940–1. doi: 10.1093/bioinformatics/btn534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Eysenbach G. Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet. J Med Internet Res. 2009;11(01):e11. doi: 10.2196/jmir.1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Brownstein J S, Freifeld C C, Madoff L C.Digital disease detection-harnessing the Web for public health surveillance N Engl J Med 2009360212153–5., 2157 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Salathe M, Bengtsson L, Bodnar T J, Brewer D D, Brownstein J S, Buckee C et al. Digital epidemiology. PLoS Comput Biol. 2012;8(07):e1002616. doi: 10.1371/journal.pcbi.1002616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Singh T, Arrazola R A, Corey C G, Husten C G, Neff L J, Homa D M et al. Tobacco use among middle and high school students-United States, 2011-2015. MMWR Morb Mortal Wkly Rep. 2016;65(14):361–7. doi: 10.15585/mmwr.mm6514a1. [DOI] [PubMed] [Google Scholar]
  • 86.Grana R, Benowitz N, Glantz S A. E-cigarettes: a scientific review. Circulation. 2014;129(19):1972–86. doi: 10.1161/CIRCULATIONAHA.114.007667. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.McNeill A, Brose L, Calder R, Hitchman S.E-cigarettes: An Evidence Update - Report Commissioned by Public Health England. Public Health England; 2015
  • 88.Polosa R.E-cigarettes: Public Health England’s evidence based confusion? Lancet 2015386(10000):1237–8. [DOI] [PubMed] [Google Scholar]
  • 89.Pacula R L, Powell D, Heaton P, Sevigny E L. Assessing the effects ofmedical marijuana laws on marijuana use: the devil is in the details. J Policy Anal Manage. 2015;34(01):7–31. doi: 10.1002/pam.21804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Hasin D S, Shmulewitz D, Sarvet A L. Time trends in US cannabis use and cannabis use disorders overall and by sociodemographic subgroups: a narrative review and new findings. Am J Drug Alcohol Abuse. 2019:1–21. doi: 10.1080/00952990.2019.1569668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.GBD 2016 Alcohol Collaborators.Alcohol use and burden for 195 countries and territories, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016 Lancet 2018392(10152):1015–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Paul M, Sarker A, Brownstein J, Nikfarjam A, Scotch M, Smith K
  • 93.Bigeard E, Grabar N, Thiessard F. Detection and analysis of drug misuses a study based on social media messages. Front Pharmacol. 2018;9:791. doi: 10.3389/fphar.2018.00791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Manchikanti L, Helm S, Fellows B, Janata J, Pampati V, Grider Jet al. Opioid epidemic in the United States Pain Physician 201215(3 Suppl):ES9–38. [PubMed] [Google Scholar]
  • 95.Vigo D, Thornicroft G, Atun R. Estimating the true global burden of mental illness. Lancet Psychiatry. 2016;3(02):171–8. doi: 10.1016/S2215-0366(15)00505-2. [DOI] [PubMed] [Google Scholar]
  • 96.Conway M, O’Connor D. Social media, big data, and mental health: current advances and ethical implications. Curr Opin Psychol. 2016;9:77–82. doi: 10.1016/j.copsyc.2016.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Ginzburg K, Ein-Dor T, Solomon Z.Comorbidity of posttraumatic stress disorder, anxiety and depression: a 20-year longitudinal study of war veterans J Affect Disord 2010123(1-3):249–57. [DOI] [PubMed] [Google Scholar]
  • 98.Nock M.editor. The Oxford Handbook of Suicide and Self-Injury. OUP; 2014
  • 99.Bryan C, Butner J, Sinclair S, Bryan A B, Hesse C, Rose A. Predictors of emerging suicide death among military personnel on social media networks. Suicide Life Threat Behav. 2018;48(04):413–30. doi: 10.1111/sltb.12370. [DOI] [PubMed] [Google Scholar]
  • 100.Hwang J D, Hollingshead K.Crazy mad nutters: the language of mental health. In: Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology. Association for Computational Linguistics; 2016. p. 52-62. Available from:http://aclweb.org/anthology/W16-0306
  • 101.Hibbin R A, Samuel G, Derrick G E. From “a fair game” to “a form of covert research”: research ethics committee members’ differing notions of consent and potential risk to participants within social media research. J Empir Res Hum Res Ethics. 2018;13(02):149–159. doi: 10.1177/1556264617751510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Mikal J, Hurst S, Conway M. Ethical issues in using Twitter for population-level depression monitoring: a qualitative study. BMC Med Ethics. 2016;17:22. doi: 10.1186/s12910-016-0105-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Benton A, Coppersmith G, Dredze M.Ethical research protocols for social media health research. In: Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Association for Computational Linguistics; 2017. p. 94-102. Available from:http://aclweb.org/anthology/W17-1612
  • 104.Vayena E, Salathé M, Madoff L C, Brownstein J S. Ethical challenges of big data in public health. PLoS Comput Biol. 2015;11(02):e1003904. doi: 10.1371/journal.pcbi.1003904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Goffman E.Stigma: Notes on the Management of Spoiled Identity. A Spectrum book. Englewood Cliffs, N.J.: Prentice-Hall; 1963
  • 106.Golder S, Ahmed S, Norman G, Booth A. Attitudes toward the ethics of research using social media: a systematic review. J Med Internet Res. 2016;19(06):e195. doi: 10.2196/jmir.7082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.O’Connor D. The apomediated world: regulating research when social media has changed research. Journal of Law, Medicine, and Ethics. 2013;41(02):470–83. doi: 10.1111/jlme.12056. [DOI] [PubMed] [Google Scholar]
  • 108.Adams N, Artigiani E, Wish E.Choosing your platform for social media drug research and improving your keyword filter listJournal of Drug Issues 2019;1–16

Articles from Yearbook of Medical Informatics are provided here courtesy of Thieme Medical Publishers

RESOURCES