Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jan 1.
Published in final edited form as: Subst Abus. 2021 Jul 2;43(1):273–277. doi: 10.1080/08897077.2021.1941508

An Unsupervised Machine Learning Approach for the Detection and Characterization of Illicit Drug-dealing Comments and Interactions on Instagram

Neal Shah 1, Jiawei Li 2,3, Tim K Mackey 1,4
PMCID: PMC9675406  NIHMSID: NIHMS1842177  PMID: 34214410

Abstract

Background:

Growing use of social media has led to the emergence of virtual controlled substance and illicit drug marketplaces, prompting calls for action by government and law enforcement. Previous studies have analyzed Instagram drug selling via posts. However, comments made by users involving potential drug selling have not been analyzed. In this study, we use unsupervised machine learning to detect and classify prescription and illicit drug-related buying and selling interactions on Instagram.

Methods:

We used over 1,000 drug-related hashtags on Instagram to collect a total of 43,607 Instagram comments between February 1st, 2019 and May 31st, 2019 using data mining approaches in the Python programming language. We then used an unsupervised machine learning approach, the Biterm Topic Model (BTM), to thematically summarize Instagram comments into distinct topic groupings, which were then extracted and manually annotated to detect buying and selling comments.

Results:

We detected 5,589 comments from sellers, prospective buyers, and online pharmacies from 531 unique posts. The vast majority (99.7%) of comments originated from drug sellers and online pharmacies. Key themes from comments included providing contact information through encrypted third-party messaging platforms, drug availability, and price inquiry. Commonly offered drugs for sale included scheduled controlled substances such as Adderall and Xanax, as well as illicit hallucinogens and stimulants. Comments from prospective buyers of drugs most commonly included inquiries about price and availability.

Conclusions:

We detected prescription controlled substances and other illicit drug selling interactions via Instagram comments to posts. We observed that comments were primarily used by sellers offering drugs, and typically not by prospective buyers interacting with sellers. Further research is needed to characterize these “social” drug marketplace interactions on this and other popular social media platforms.

Keywords: online drug marketplace, social media, Instagram, opioid use disorder, infoveillance

INTRODUCTION

The expansion of communication technologies and connectivity has increased the modalities in which prescription and illicit drugs can be bought and sold online.1,2 In parallel with growing technology adoption, illicit marketplaces have evolved to include the use of digital tools, such as social media platforms, which are now ubiquitous in daily life.35

Multiple studies have demonstrated that social media platforms, such as Instagram, Facebook, Reddit, and Twitter, serve as marketplaces for illegal sale and distribution of prescription medications and illegal narcotics.3,68 Policymakers have sought to curtail these online drug transactions with enactment of the 2008 Ryan Haight Online Pharmacy Consumer Protection Act (RHA), which made it a Federal crime to offer the sale of controlled substances online.9 Despite these efforts, social media drug sellers remain active, primarily posting offers for drugs in user-generated comments and through hyperlinked websites.3,6,10,11

One of the most popular social media platforms, particularly among youth and young adults is Instagram (owned and operated by Facebook), an image and video-sharing site with over 1 billion monthly users, making it the 2nd most active social media network in the world.12 Previous studies have been conducted on the Instagram platform to categorize substance abuse behavior and have used hashtag, image and text analysis to identify drug related themes, including identifying drug selling activities.8,1315

A 2019 study authored by Li et al utilized a deep learning model to detect illegal prescription controlled substance and illicit drug selling posts on Instagram8, finding 267 unique selling accounts via posts (i.e., when an Instagram account creates a post with text or image content that other followers can then comment to). However, beyond these posts, interactions between sellers and prospective buyers were observed in the comments section of the platform but were not further analyzed. There are no studies to our knowledge that have specifically explored how interactions in the comments section of Instagram posts relate to drug selling and consumption behavior. Therefore, this study utilizes an unsupervised topic modeling approach to detect and thematically categorize drug selling and buying interactions via Instagram comments.

REVIEW

Our methodology was comprised of three main phases: data collection, data processing, and content analysis.

Data Collection

We first collected Instagram posts using a custom web scraper that looped over 1,000 controlled substance and illegal narcotics-related hashtag keyword searches between February 1st, 2019 to May 31st, 2019 allowing both retrospective and prospective data collection. Details on the exact methodology and specific hashtag keywords used are detailed in a separate published study by Li et al.8 All data collection and processing were carried out using the Python™ programming language and associated packages. Given the large volume of posts collected in social media data mining studies, manual annotation of the entire dataset is often impractical as there is often “noise” unrelated to study aims but that contain queried hashtags.16,17 Therefore, in order to appropriately identify the subset of data relevant to the selling and buyer interactions of controlled substances and illicit drugs, which we call “signal”, we used natural language processing (NLP) and unsupervised machine learning approaches to extract Instagram comments exhibiting patterns associated with potential seller and buyer interactions in our data processing phase.

Data Processing

Since our study did not rely on an existing training set of Instagram comments, we used an unsupervised machine learning approach - called the Biterm Topic Model (BTM) - designed to detect patterns in short text to summarize the content of the entire corpus of Instagram comments into distinct topics.18 Before running BTM, comments data were pre-processed to better fit the model and to improve clustering performance. For each comment, text was extracted and cleaned through pre-processing procedures before using BTM (see Online Supplement Material 1).

Given the vectorized text corpus and the number of possible topics (represented as (k)), BTM fit the corpus with a discrete probability distribution for all words for each topic, while placing larger weights on words that were most representative of a given topic. For each topic, we used the top 20 words that had the largest weights as its representation. In practice, we experimented with different numbers of topics (k =5,10,15,20,25,30) and chose the best k value by evaluating the clustering performance based on the average coherence score, which measured the isolation level of clustering results (see Online Supplemental Material 2).

After three iterations of BTM, we were able to filter out all the clusters that contained topics and weighted keywords related to the potential buying and selling of drugs. Based on the representation of the topics produced by BTM, we selected the topics that contained signal characteristics (e.g. including high frequency and clustering of keywords “sell”, “buy” or high frequency of other “selling argument” terms that are commonly used by online drug sellers).19 A more detailed explanation of how specific clusters were deemed “signal” by authors is available in Online Supplemental Material 3.

If the volume of comments associated to the selected topics was large, another around of BTM processing was conducted until the volume was reduced to a lower number, or the data was thematically saturated with selling-related topics. These topics and their associated Instagram comments were then extracted for content analysis using human annotation.

Content Analysis

After three rounds of BTM, the topics we retrieved contained the highest concentration of comments we believed to be highly associated with illicit online drug sales. The extracted comments were then manually annotated to confirm whether they were related to illegal drug selling and buying interactions. We first developed a codebook to identify accounts seeking to buy controlled substances and illegal drugs, accounts offering the sale of controlled substances and illegal drugs, and online pharmacies, each categories of sellers identified in prior studies.1,6,2022

Instagram selling accounts were classified as those offering controlled substances and illicit drugs for sale via the platform, providing information regarding drug availability, shipping terms, or contact information on how to enter into a transaction. Instagram buying accounts were classified as those inquiring about drug availability, product pricing, and delivery information. Online pharmacies were classified as accounts containing any of the characteristics associated with selling accounts, but also including a distinct URL/hyperlink to an external website to purchase drugs or that mentioned “pharmacy”, “pharma”, or “pharm” in their contact information.

A comment was labeled as “signal” only when the following conditions were met: (1) the text in the comment demonstrated intention to promote drug sales; and (2) the comment offered at least one kind of contact information method. The annotation process was carried by two human coders. First and second authors coded posts independently and achieved high intercoder reliability (kappa=0.98). For inconsistent results, authors reviewed and conferred on the correct classification with last author. We then aggregated comments labeled as “signal” by the individual Instagram accounts they were associated with and conducted content analysis of additional comments from each of these accounts. For each user, we summarized the total number of drug-dealing comments made, the number of comments with unique content, and the number of unique posts commented under.

Authors also conducted a qualitative sub-analysis on buying-selling interactions to examine how comments detected in Instagram posts changed over time for the purpose of characterizing how sellers adapt in messaging when targeting potential buyers. Full results are included in Online Supplemental Material 4.

RESULTS

We collected a total of 12,614 Instagram posts based on our controlled substance and illicit drug-related hashtag search terms, which contained a total of 43,607 Instagram comments. After BTM, we filtered out clusters that contained drug-dealing keywords.

We manually coded all 7,491 comments in selected clusters to confirm and characterize drug selling behavior and types of seller and buyer accounts. We detected 5,589 comments from 531 unique Instagram accounts that we identified as selling drugs that originated from 2,024 Instagram posts. Of the 5,589 comments identified, the majority of comments identified were categorized as originating from sellers (97.4%, n=5,399), with far fewer categorized as online pharmacies (2.9%, n=129) or buyers (0.3%, n=13). Examples of these categories are provided in Figure 1.

Figure 1.

Figure 1.

Examples of illicit-drug dealing comments.

Three key themes emerged from the content analysis of this signal dataset: (1) the exchange of contact information between buyers; (2) discussions about availability of drug products; and (3) and pricing inquiries. For the first category, “contact information” included sellers disclosing their phone numbers, WhatsApp information, or other contact information for encrypted messaging apps. “Availability” included conversations and interactions about the quantities of drugs available, types of drugs being sold, and details on delivery information. Finally, “price inquiry” was primarily initiated by buyers and involved interactions where an inquiry was made about the price of drugs being sold. In total, 5,498 (98.3%) comments included contact information, 12 (0.2%) included a price inquiry, and only 1 (0.02%) comment related to availability.

In the context of types of drugs offered, most drug-related comments listed multiple illegal narcotics and prescription controlled substances offered for sale. Of the 5,399 drug dealing comments, 5,234 (96.9%) made specific mentions to drugs that were being sold by a seller and the remaining 165 (3.1%) comments simply provided contact information without specific mentions of the drugs that were being offered (though the prior user comments and content related to drug selling). Numerous illicit drugs were detected, including most commonly: lysergic acid diethylamide, psilocybin or hallucinogenic mushrooms, and cocaine. For controlled substances, the most common were: Xanax (alprazolam), Adderall (amphetamine-dextroamphetamine), and OxyContin (oxycodone). Sparse mentions to Ritalin (methylphenidate), Vyvanse (lisdexamfetamine dimesylate), and benzodiazepines were also observed.

We note that comments discussing sale of illicit drugs and controlled substances often contained multiple product mentions, so co-occurrence of drug mentions in a single comment or via an Instagram account was common. Of the 5,234 comments that included specific mentions to illicit drugs or controlled substances, comments selling prescription controlled substances were the most common (94.0%, n=4,920). The second most common were hallucinogenic substances (71.1%, n=3,720) and were generally always lysergic acid diethylamide or psilocybin hallucinogenic mushrooms. Additionally, 3,566 (68.1%) mentioned illicit drug stimulants, most commonly cocaine.

Most interactions via Instagram comments between selling and buying user accounts followed a similar pattern that can be divided into two distinct parts: (a) a drug seller or online pharmacy describing the product being sold (illegal narcotic or controlled substance), and (b) followed by providing contact information (WhatsApp, Kik, Wickr, URL) when a buyer expressed interest in a transaction. In general, these drug transaction interaction comments did not relate to the theme or message of the parent Instagram post (and associated Image) they were included in – rather, these interactions simply appeared in posts that contained drug-specific hashtags regardless of the original topic of the post.

For instance, one parent post we reviewed was related to the discovery of a vial of medicine as a historical artifact, but its opioid-related hashtag prompted numerous drug sellers to comment on the post. The use of drug-related hashtag was the main indicator of whether drug sellers would comment, not the content of the parent post. Similarly, posts about people’s personal journeys in substance abuse recovery and treatment after opioid addiction prompted many comments to these posts by drug sellers.

CONCLUSIONS

This study focused on the detection and characterization of illicit drug selling interactions via user comments on Instagram. Prior published studies have primarily focused analysis on Instagram posts and Twitter tweets, but have not conducted analysis on the comments made in response to generated posts or tweets. Our results indicate that illegal drug selling activity occurs in the Instagram comments section even if the parent post is unrelated to drug diversion and selling topics.

Additionally, though the focus of this study was to identify interactions between drug sellers and prospective buyers, the vast majority of comments we detected (99.85%) were generated by drug sellers or online pharmacies and did not include interaction with buyers. Interactions between sellers and buyers we observed focused on transactional aspects, including how sellers could be contacted, availability of product, and questions about pricing, but were low in volume. The high percentage of seller comments containing information about other communitcation tools (e.g., encrypted communication applications and other contact information listed), may indicate that selling and buying transactions occur elsewhere and more specifically in closed-communication applications.

Results point to new characteristics on potential patterns and strategies of how illegal drug selling occurs via social media platforms. Specifically, prior studies on Instagram have found that parent posts by drug dealers often include relevant drug-related hashtags, images of available product, and usually contain contact information.8,13,14 In contrast, comments reviewed in this study generally were associated with popular drug-related hashtags, regardless of whether the original post was associated with drug selling or buying.

Our study has certain limitations. First, this study was limited to a specific period of data collection and we did not purposely sample Instagram accounts or consider user characteristics such as age, gender, or other demographics. Hence, study results are not generalizable. Importantly, we did not interact with accounts to confirm if drugs purportedly being sold were actually available. Conducting test purchases of controlled substances and other illicit drugs is prohibited by federal law, including for research purposes. Additionally, there was a time lag between data collection, BTM analysis, and manual annotation that resulted in some posts and comments being removed or modified before being manually inspected by authors. Future studies should develop tools to increase the speed of data analysis using machine learning approaches such as classification of text, images, and other metadata.

Supplementary Material

Supp 1

Table 1:

Number of the Instagram post and comments after using BTM

# of posts # of comments
Raw data (json file after removing duplicate posts) 12614 43607
After 1st round BTM removed 6 topics 4062 11858
After 2nd round BTM removed 2 topics 3533 9545
After 3rd round BTM removed 1 topic 3433 9274
Remove comments with length less than 2 2981 7491

Table 1 shows the number of comments and posts removed after rounds of BTM and removing clusters that did not contain signal characteristics.

Funding Source:

Disclosure:

This study was funded by a grant from the National Institute on Drug Abuse (1R21DA050689-01). The funding organization had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Footnotes

DECLARATIONS:

Ethics Approval and Consent to Participate: Not applicable/Not required for this study. All information collected from this study was from the public domain and the study did not involve any interaction with users. User indefinable information was removed from the study results.

Consent to Publish: Not applicable.

Competing Interests: JL and TKM are employees of the startup company S-3 Research LLC. S-3 Research is a startup funded and currently supported by the National Institutes of Health – National Institute on Drug Abuse through a Small Business Innovation and Research contract for opioid-related social media research and technology commercialization. Author reports no other conflict of interest associated with this manuscript.

Availability of Data and Materials:

Data collected on social media platforms is available on request from authors subject to appropriate de-identification.

REFERENCES:

  • 1.Mackey TK. Opioids and the Internet: Convergence of Technology and Policy to Address the Illicit Online Sales of Opioids. Health Serv Insights. SAGE Publications; 2018. Sep 14;11:117863291880099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Seaman I, Giraud-Carrier C. Prevalence and Attitudes about Illicit and Prescription Drugs on Twitter. IEEE; 2016. pp. 14–7. [Google Scholar]
  • 3.Moyle L, Childs A, Coomber R, Barratt MJ. #Drugsforsale: An exploration of the use of social media and encrypted messaging apps to supply and access drugs. International Journal of Drug Policy. Elsevier; 2019. Jan 1;63:101–10. [DOI] [PubMed] [Google Scholar]
  • 4.Demant J, Bakken SA, Oksanen A, Gunnlaugsson H. Drug dealing on Facebook, Snapchat and Instagram: A qualitative analysis of novel drug markets in the Nordic countries. Drug and Alcohol Review. John Wiley & Sons, Ltd (10.1111); 2019. May 1;38(4):377–85. [DOI] [PubMed] [Google Scholar]
  • 5.Graves RL, Tufts C, Meisel ZF, Polsky D, Ungar L, Merchant RM. Opioid Discussion in the Twittersphere. Subst Use Misuse. 2018. May 14;53(13):2132–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Mackey TK, Kalyanam J, Katsuki T, Lanckriet G. Twitter-Based Detection of Illegal Online Sale of Prescription Opioid. American Journal of Public Health. 2017. Dec;107(12):1910–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Van Hout MC, Hearne E. New psychoactive substances (NPS) on cryptomarket fora: An exploratory study of characteristics of forum activity between NPS buyers and vendors. International Journal of Drug Policy. Elsevier; 2017. Feb 1;40:102–10. [DOI] [PubMed] [Google Scholar]
  • 8.Li J, Xu Q, Shah N, Mackey TK. A Machine Learning Approach for the Detection and Characterization of Illicit Drug Dealers on Instagram: Model Evaluation Study. J Med Internet Res. 2019. Jun 15;21(6):e13803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Mackey TK, Liang BA, Strathdee SA. Digital social media, youth, and nonmedical use of prescription drugs: the need for reform. J Med Internet Res. 2013;15(7):e143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bachhuber MA, Cunningham CO. Availability of buprenorphine on the Internet for purchase without a prescription. Drug Alcohol Depend. 2012. Nov;130(1–3):238–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Dasgupta N. Crowdsourcing Black Market Prices For Prescription Opioids. Journal of Medical Internet Research. JMIR Publications Inc., Toronto, Canada; 2013;15(8):e178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Smith A, Anderson M. Social Media Use in 2018 [Internet]. pewinternet.org. 2018. [cited 2018 Mar 22]. Available from: http://www.pewinternet.org/2018/03/01/social-media-use-in-2018/ [Google Scholar]
  • 13.Yang X, Luo J. Tracking Illicit Drug Dealing and Abuse on Instagram Using Multimodal Analysis. ACM Transactions on Intelligent Systems and Technology (TIST). ACM; 2017. Jun 23;8(4):58–15. [Google Scholar]
  • 14.Zhou Y, Sani N, Luo J. Fine-grained mining of illicit drug use patterns using social multimedia data from instagram. IEEE; 2016. pp. 1921–30. [Google Scholar]
  • 15.Petersen MA, Petersen IL, Poulsen C, Nørgaard LS. #studydrugs-Persuasive posting on Instagram. Int J Drug Policy. Int J Drug Policy; 2021. Jan 12;:103100. [DOI] [PubMed] [Google Scholar]
  • 16.Kalyanam J, Mackey TK. A Review of Digital Surveillance Methods and Approaches to Combat Prescription Drug Abuse. Current Addiction Reports. Springer International Publishing; 2017. Sep 18;4(4):397–409. [Google Scholar]
  • 17.Kalyanam J, Katsuki T, R G Lanckriet G, Mackey TK. Exploring trends of nonmedical use of prescription drugs and polydrug abuse in the Twittersphere using unsupervised machine learning. Addictive Behaviors. 2017. Feb;65:289–95. [DOI] [PubMed] [Google Scholar]
  • 18.Yan X, Guo J, Lan Y, Cheng X. A biterm topic model for short texts. the 22nd international conference. New York, New York, USA: ACM; 2013. 12 p. [Google Scholar]
  • 19.Orizio G, Rubinelli S, Schulz PJ, Domenighini S, Bressanelli M, Caimi L, et al. “Save 30% if you buy today.” Online pharmacies and the enhancement of peripheral thinking in consumers. Pharmacoepidemiol Drug Saf. 2010. Sep;19(9):970–6. [DOI] [PubMed] [Google Scholar]
  • 20.Mackey T. Solution to Detect, Classify, and Report Illicit Online Marketing and Sales of Controlled Substances via Twitter: Using Machine Learning and Web Forensics to Combat Digital Opioid Access. Journal of Medical Internet Research. JMIR Publications Inc., Toronto, Canada; 2018;20(4):e10029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Forman RF. Availability of opioids on the Internet. JAMA. 2003. Aug 20;290(7):889. [DOI] [PubMed] [Google Scholar]
  • 22.Forman RF, Woody GE, McLellan T, Lynch KG. The availability of web sites offering to sell opioid medications without prescriptions. Am J Psychiatry. 2006. Jul;163(7):1233–8. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp 1

Data Availability Statement

Data collected on social media platforms is available on request from authors subject to appropriate de-identification.

RESOURCES