. 2019 Oct 4;27(2):315–329. doi: 10.1093/jamia/ocz162

Table 3.

Summary of the primary objectives, approaches, and findings from the studies included in this review

Study	Primary objective(s) and/or significance	Primary approach(es)	Primary finding(s)
Schifano et al⁴³	First study to explore web forums for drug abuse research. Objective was to analyze data from “web pages” related to information on consumption, manufacture and sales of psychoactive substances.	Manual exploration of search engines using drug names as keywords. User posts from 1633 websites were analyzed primarily for contents (personal intake and/or trading) and stance (pro- vs anti-drug).	18% websites included pro-drug chatter, 10% included harm reduction, and 10% included drug trading. Previously unknown coingestion patterns were discovered.
McNaughton et al⁴⁴	To explore the sentiment expressed by opioid abusers and their endorsement behavior on internet forums. First study to employ automated methods for analyzing social media chatter related to abuse or misuse.	Mixed-effects multinomial logistic regression was applied to model the probability of endorsing, discouraging, mixed, or unclear messages per compound. Endorsement to discouragement ratios were estimated for each compound.	The following list (ordered), in terms of endorsement ratio, was obtained for the included drugs: oxymorphone, hydromorphone, hydrocodone, oxycodone, morphine, and tramadol.
Daniulaityte et al⁴⁶	To analyze nonmedical use of loperamide, as reported on a specific patient forum.	Retrieved posts mentioning from 2005 to 2011. A random sample of 258 posts were manually annotated to identify intent, dosage, and side effects.	The discussion suggested that high doses of loperamide are used to address opioid withdrawal symptoms or as a methadone substitute.
Davey et al⁴⁵	To analyze the key features of drug-related Internet forums and the communities.	Categories, themes, and attributions were manually analyzed from 8 forums (qualitative).	The study identified unique communities of recreational drug users that can provide information about new drugs and drug compounds.
Cameron et al⁴⁷	The development of a semantic web platform called PREDOSE (PREscription Drug abuse Online Surveillance and Epidemiology), designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media.	A drug abuse ontology is used to recognize 3 types of data, namely (1) entities, (2) relationships, and (3) triples. Basic natural language processing approaches are used to extract entities and relationships, and to identify sentiment.	The reported approach obtains 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In manual evaluation, the system obtains 36% precision in relationship identification, and 33% precision in triple extraction.
Hanson et al²⁵	To identify variations in the volume of Adderall chatter by time and geographic location in the United States, as well as commonly mentioned side effects and coingested substances.	Tweets containing the term Adderall were collected from November 2011 to May 2012, and a keyword-based approach was used to detect coingested substances and side effects using manual analysis of geolocation clusters and temporal pattern.	Twitter posts confirm Adderall as a study aid among college students. Twitter may contribute to normative behavior regarding its abuse.
Hanson et al⁴⁸	To analyze the networks of users who report abusing/misusing prescription medications.	Tweets mentioning prescription medications were collected from Twitter as well as users mentioning prescription medications multiple times. Social circles of 100 users were analyzed, particularly their discussions associated with prescription drug abuse.	Twitter users who discuss prescription drug abuse online are surrounded by others who also discuss it—potentially reinforcing a negative behavior and social norm.
McNaughton et al⁴⁹	To evaluate the reactions to the introduction of reformulated OxyContin. To identify methods aimed to defeat the abuse-deterrent properties of the product.	Posts spanning over 5 years collected from 7 forums were evaluated before and after the introduction of reformulated OxyContin on August 9, 2010. Qualitative and quantitative analyses of the posts were performed to assess proportions and sentiments.	Sentiment profile of OxyContin changed following reformulation. OxyContin was discouraged significantly more following reformulation. Frequency of posts reporting abuse decreased over time.
McNaughton et al⁵⁰	To assess the amount of discussion and endorsement for abuse of tapentadol and comparator drugs.	Internet messages posted between January 1, 2011, and September 30, 2012, on 7 web forums were evaluated. Proportions of posts and unique authors discussing tapentadol were compared with 8 comparator compounds.	Recreational abusers appeared to be less interested in discussing tapentadol abuse.
MacLean et al⁵¹	To assess the effectiveness of a specialized forum in helping misusers/abusers of prescription opioids.	A taxonomy describing the phases of addiction was developed, and the activities and linguistic features across phases of use/abuse, withdrawal, and recovery were examined. Statistical classifiers were developed to identify addiction, relapse, and recovery phases.	According to the forum data, almost 50% of recovering abusers relapsed, but their prognosis for recovery is favorable.
Shutler et al²³	Qualitatively assess tweets mentioning prescription opioids to determine if they represent abuse or nonabuse, or were not characterizable. To assess the connotation (positive, negative, noncharacterizable).	Manual categorization of posts into predefined categories—abuse, nonabuse, and not characterizable; and, in terms of connotation, positive, negative, and not characterizable.	Twitter can be a potential resource for monitoring prescription opioid use, as abuse is commonly described by users (mostly with a positive connotation).
Buntain and Goldbeck⁵²	To assess how tweets can augment a public health program that studies emerging patterns of illicit drug use.	The article proposed an architecture for collecting vast numbers of tweets over time. Automatic topic modeling was employed to identify topics, and temporal and geolocation-based analyses were discussed.	An architecture for mining Twitter data for drug abuse monitoring (illicit and prescription).
Katsuki et al⁵³	To conduct surveillance and analysis of tweets to characterize the frequency of prescription medication abuse-related chatter, and identify illegal online pharmacies involved in drug trading.	Tweets collected using medication keywords and street names were manually coded to indicate misuse or abuse behavior and attitude (positive/negative). Supervised machine learning automatically identified over 100 000 tweets mentioning abuse or promotion. Word frequency–based experiments identified associations. Geolocations were analyzed for geographic distributions.	The study found a large number of tweets (over 45 000) that directly marketed prescriptions medications illegally. Supervised machine learning showed adequate performance in automatic detection.
Chan et al⁵⁴	To manually analyze opioid chatter from Twitter.	Data was collected from Twitter over 2 weeks and manually coded (eg, personal vs general experiences including nonmedical use, and user sentiments toward opioids) for analysis.	Personal opioid misuse was the most common theme among the tweets analyzed.
Seaman and Giraud-Carrier⁵⁵	To present statistics about volume as well as attitudes toward distribution (selling/buying) and need.	Only a small number (500) of tweets were manually analyzed. New York–based tweets showed that buying/selling and “need” were the most common topics associated with the drug names.	Twitter users often express the need for Adderall and Xanax; chatter related to specific drugs is directly impacted by media events involving such substances.
Ding et al⁵⁶	To detect abuse-related posts and discover new, unknown street names for drugs.	A sample of Instagram posts was annotated for medical use, illicit use, not related, or not sure. Topic modeling (LDA) was used to track changes in hashtags. Hand-annotated tweets were used to identify proportions for abuse-related tweets. Manual analysis of hashtags performed to assess the performance of the word embeddings.	The topic modeling approach retrieves drug-related posts with 78.1% accuracy. Word embeddings learned from social media data are useful for finding new hashtags and street terms associated with abuse.
Jenhani et al⁵⁷	To propose methods for automatically detecting drug-abuse-related events from Twitter.	A hybrid approach consisting of a rule-based component and supervised machine learning is described. Automatically annotated tweets are used for evaluation, showing 0.51 F-score.	Machine learning based approach can detect events not detected by rules. Findings are limited by the fact that only automatically annotated data is used for evaluation, which is prone to errors.
Zhou et al⁵⁸	To explore the possibility of using multimedia data (images and text) to discover drug usage patterns at a fine-grained level with respect to demographics.	Posts were retrieved from Instagram using drug-related hashtags. An initial set of hashtags was used to create a dictionary of hashtags. User demographics, such as age and gender, were predicted using face-image analysis algorithms. Patterns of drug-usage associated with demographics, time and location were then analyzed.	Findings from social media mining are consistent with findings of the NSDUH (qualitatively), even at a fine-grained level.
Sarker et al²⁴	To verify that abuse information for abuse-prone medications in social media is higher than non–abuse-prone medications. To assess the possibility of automatically detecting abuse via NLP and machine learning. To compare automatically classified temporal data with past manual analysis.	Manually annotated 6400 tweets to indicate abuse vs nonabuse. Evaluation of automatic classification was performed via 10-fold cross-validation; tests for proportions of abuse-related posts between case and control medications. Compared classified Adderall tweets with past manual analysis.	There is significantly more abuse-related information for abuse prone medications compared with non–abuse-prone medications. Supervised machine learning is an effective approach for automated monitoring.
Anderson et al⁵⁹	To determine if misuse or abuse could be detected via social media listening. To describe and characterize social media posts.	Posts were collected using generic, brand, and vernacular brand names and were reviewed manually by coders.	Agreement among raters in manual categorization was low (0.448). Analysis of posts revealed that 8.61% referenced misuse or abuse, including routes of intake. Web forums present a valuable new source for monitoring nonmedical use of medications.
Kalyanam et al⁶⁰	To demonstrate that the geographic variation of social media posts mentioning prescription opioid misuse strongly correlates with government estimates of prescription opioid misuse in the previous month.	Tweets were collected from 2012 to 2014, using opioid keywords. Tweets were automatically quantified using semantic distance with word centroids. Unsupervised classification/clustering used to group tweets mentioning opioid misuse. Volume of abuse-related chatter was correlated with NSDUH surveys, with separate correlations for different age groups.	Mentions of misuse or abuse of prescription opioids on Twitter correlate strongly with state-by-state NSDUH estimates.
Phan et al⁶¹	To verify that tweets contain patterns of drug abuse. To study the correlations among different levels of drug usage including abuse, addiction and death, and assess the applicability of large-scale systems for online social network-based drug abuse monitoring.	Manual annotation of opiate-mentioning tweets and basic feature selection methods were developed. Several machine learning classifiers were then trained and evaluated. Word co-occurrence patterns for abuse-indicating tweets were identified and used as features in machine learning experiments. Correlations between words and drug terms were computed.	The best performance was obtained by a decision tree-based classifier, but performance was low compared with human judgment.
Yang et al⁶²	To propose a multitask learning method to leverage images from Instagram for recognition of drug abuse. To identify user accounts involved in illicit drug trading.	A multitask learning method was employed for image classification (stage 1) and accounts of interest were identified. Drug-related patterns, temporal patterns, and relational information patterns were detected from the user timelines and potential dealer accounts were detected (stage 2).	A reproducible machine learning model for tracking and combating illicit drug trade on Instagram. The framework can be reused and improved for practical tracking and combating of illicit drug trade on Instagram.
Chary et al⁶³	Demonstrate that the geographic variation of tweets mentioning prescription opioid misuse strongly correlates with government estimates in the previous month.	Basic preprocessing was performed on tweets from 2012 to 2014 (signal tweets and basal tweets) collected by keywords linked to prescription opioid use (misspellings as well). Tweets were manually annotated and geodata was collected. Compared tweets with NSDUH.	State-by-state correlation between Twitter and NSDUH data was high. Correlation was strongest in NSDUH data for 18- to 25-year-olds.
D’Agostino et al⁶⁴	To examine the online Reddit community’s ability to target and support individuals recovering from opiate addiction.	Collected 100 Reddit posts and their comments from August 19, 2016. Manually annotated the posts/comments according to DSM-5 criteria to determine the addiction phases of individual users.	Demonstrated the supportive environment of the online recovery community and the willingness to share self-reported struggles to help others.
Cherian et al⁶⁵	To characterize information about codeine misuse through analysis of public posts on Instagram to understand text phrases related to misuse.	1156 posts were collected over 2 weeks from Instagram via hashtags and text associated with codeine misuse. Themes and culture around misuse were identified through manual analysis.	50% of reported abuse involved combining codeine with soda (lean). Common misuse mechanisms included coingestion with alcohol, cannabis, and benzodiazepines.
Graves et al⁶⁶	To determine whether Twitter data could be used to identify geographic differences in opioid-related discussion. To study whether opioid topics were significantly correlated with opioid overdose death rate.	Tweets collected using keywords from 2009 to 2015. Topic modeling (LDA) used to summarize contents into 50 topics. The correlations between topic distribution and census region, census division and opioid overdose death rates were quantified.	Selected topics were significantly correlated with county- and state-level opioid overdose death rates.
Hu et al⁶⁷	To build a system for effective drug abuse related data collection from social media and develop an annotation strategy for categorization of data (abuse vs nonabuse) and a deep learning model that can automatically categorize tweets.	More than 800 keywords were used to collect data, followed by crowd-sourced annotation of 4985 tweets. Deep learning model built on small annotated data and evaluated via 10-fold cross-validation. Geographic distribution over 100 000 tweets (positively classified) were analyzed.	The crowd-sourced annotation method enabled annotation at a much faster rate and lower cost. Deep learning model achieved state-of-the-art classification performance. Semantic analysis of tweets revealed drug abuse behaviors. Geolocation-based analysis enabled the identification of geographic hotspots.
Chary et al⁶⁸	To demonstrate that data concerning polysubstance use can be extracted from online user posts, and that these data can be used to infer novel as well as known coingestion patterns.	Posts were retrieved via web scraping and basic natural language processing methods were applied to identify possible mentions of drugs. Correlation was computed between mentions of pairs of drugs to identify common ingestion patterns based on mentions of drugs.	183 coingestion combinations were discovered, including 44 that had not been studied before.
Fan et al⁶⁹	To propose a novel framework named AutoDOA to automatically detect opioid addicts from Twitter.	Five groups of annotators (18 persons) with domain expertise labeled 19 722 tweets from 2312 users to identify potential addicts. Using only annotations with full agreement, an approach relying on meta-path–based similarity was used to perform transductive classification of the users based on the tweets, their likes, and their networks.	Evaluation on annotated data shows that this method outperforms other approaches; A case study on 1132 identified heroin addicts qualitatively show similarities with CDC estimates of overdoses.
Bigeard et al⁷⁰	To create a typology for drug abuse or misuse and methods for automatic detection and propose methods for classification of drug misuses by analyzing user-generated data in French social media.	1850 posts were annotated into 4 categories—misuse, normal use, no use, and unable to decide. Categories were used to create a typology of misuses and to evaluate an automatic system. Several machine learning algorithms were then trained on artificially balanced data to categorize among misuse, no use, and normal use.	Multinomial naïve Bayes is shown to achieve the best performance on the artificially balanced data. The manual categorization of the data reveals an elaborate typology of intentional and unintentional misuse. The annotator agreements are relatively low, showing the difficulty of the misuse annotation task.
Chen et al⁷¹	To qualitatively analyze posts about methylphenidate from French patient forums including an analysis of information about misuse or abuse.	Data were collected from French social networks that mentioned methylphenidate keywords. Text mining methods such as named entity recognition and topic modeling where used to analyze the chatter, including the identification of adverse reactions.	Analysis of the data revealed cases of misuse of the medication and abuse.
Pandrekar et al⁷²	To demonstrate the potential of analyzing social media (specifically Reddit) data to reveal patterns about opioid abuse at a national level	Collected 51 537 Reddit posts between January 2014 to October 2017; evaluated psychological categories of the posts and characterized the extent of social support; performed topic modeling to determine major topics of interests and tracked differences between anonymous and nonanonymous posts.	The information shared on Reddit can provide a candid and meaningful resource to better understand the opioid epidemic.
Lossio-Ventura and Bian⁷³	To study and understand (1) the contents of opioid-related discussions on Twitter, (2) the coingestion of opioids with other substances, (3) the trajectory of individual-level opioid use behavior, and (4) the vocabulary used to discuss opioids.	310 323 tweets were collected over 4 months, and 124 143 tweets were included in the study following rule-based filtering. Keyword frequency and co-occurrence based methods were applied to meet the objectives of the study.	Although most of the chatter talked about use of opioids as legitimate pain relievers, there was considerable discussion about misuse or abuse and coingestion of opioids with other substances; 18 new terms for opioids, which were previously not encoded, were discovered.
Hu et al⁷⁴	To establish a framework for automatic, large-scale collection of tweets based on supervised machine learning and crowd sourcing, with a self-taught learning approach for automatic detection.	Data were collected from Twitter using keywords and following an initial annotation by the authors, crowdsourcing was utilized for obtaining reliable annotations. An iterative automatic classification approach is applied where the training data is augmented with machine-classified tweets to improve performance. Both traditional and neural network–based classifiers were experimented with.	The neural network–based (convolutional and recurrent) deep, self-taught learning algorithms outperformed traditional models in the binary classification task with ∼86% accuracy.
Adams et al⁷⁵	To demonstrate the benefit of mining platforms other than Twitter, and the use of word embeddings for keyword synonym discovery resulting in increased collected data.	The synonym discovery method was compared for finding terms relevant to marijuana and opioids from 2 sources—Twitter and Reddit.	The synonym discovery method yielded more synonyms from Reddit than Twitter. Twitter, however, provided more slang terms.
Lu et al⁷⁶	To demonstrate the insights that can be obtained from employing data mining techniques on social media to better understand drug addiction.	Collected 309 528 posts from 125 194 unique Reddit users between January 2012 and May 2018. Used a trained classifier to predict transition from casual drug discussion to drug recovery. Used a Cox regression model to calculate the likelihood of the transition.	Found that certain utterances and linguistic features of one’s post can help predict the transition to drug recovery and determined specific drugs that are associated more with transition to recovery, which offers insight into drug culture.
Tibebu et al⁷⁷	To assess if Twitter maybe used as a data source for studying population-level opioid use and perceptions in Canada.	Collected 2602 tweets over 1 month and manually categorized 826 tweets to study usage and perceptions.	The analyzed tweets presented information about medical usage of opioids, impacts of opioid use on family and friends, and drug use in public places. Tweets representing user perceptions were mostly associated with the keywords heroin, fentanyl, and opioids.
Chancellor et al⁷⁸	To assess if Reddit contains information on clinically unverified alternative treatments to opioid use disorder, develop a machine learning approach for discovering posts representing alternative treatments, and identifying commonly reported agents for successful recovery.	A transfer learning approach was developed to automatically detect posts discussing recovery from opioid use disorder and was applied to all the posts collected from 63 subreddits. An approach involving regular expressions and word embeddings is used identify alternative treatments from the positively classified posts.	The transfer learning–based classification approach obtained accuracy of 91.7%, leading to 93 104 recovery posts. Common drugs discovered for alternative treatments included both prescription (eg, Loperamide, Xanax, Valium, Klonopin, gabapentin) and nonprescription (eg, kratom) drugs.

CDC: Centers for Disease Control and Prevention; LDA: latent Dirichlet allocation; NSDUH: National Survey on Drug Use and Health.