Abstract
Social media use is growing globally, with a reported 3 billion active users in 2017. This medium is used increasingly in a health setting by patients (and to a limited extent, healthcare professionals) to share experiences and ask advice on medical conditions as well as pharmaceutical products. In recent years, attention has turned to this huge, generally untapped, source of potential health information as a possible tool for pharmacovigilance, and in particular signal detection. In this article we explore some of the challenges of utilizing social media for safety signal detection and look at some of the pilot studies conducted to date in order to weigh the evidence for and against the utility of social media data in safety signal detection. After doing so we can conclude that the analysis of social media datasets has demonstrated a limited contribution to the signal detection and signal management process. The data available in social media can complement blind spots in traditional pharmacovigilance datasets and provide significant value for targeted investigations and studies such as those relating to abuse, misuse, use in pregnancy, and patient sentiments.
Keywords: internet, social media, signal detection, signal management, pharmacovigilance
Introduction
Spontaneous adverse event (AE) reporting, long the bedrock of pharmacovigilance signal detection, suffers from reporting biases including under reporting, particularly of well known or nonserious events or those associated with misuse, abuse or medication error.1 In contrast, users of social media (defined as a collection of websites and applications that enable users to create and share content or to participate in social networking2) appear willing to share details of their experiences, often seeking assurances that they are not alone or looking for advice. Social media is becoming ubiquitous in our society, with a recent report by We Are Social and Hootesuite3 indicating that social media use is still rapidly growing with 3 billion active social media users globally, that is 40% of the population. It is therefore perhaps not surprising that over recent years there has been a surge of interest in investigating whether social media provides a valuable tool for understanding more about a patient’s and to a limited extent health care providers’ experience with pharmaceutical products and in particular for detecting safety signals.4–9 We seek to review some of the work that has been performed to date and discuss some of the challenges to try and draw a conclusion as to the utility of using social media in signal detection.
Challenges in using social media as a data source
Sloane and colleagues10 comment that ‘In order to realize the benefits social media holds, a number of technical, regulatory and ethical challenges remain to be addressed’. This view is supported by Bousquet and colleagues,11 who list five challenges to overcome to effectively operationalize the analysis of patient posts: variable quality of information, data privacy issues, pharmacovigilance expert expectations, identification of relevant information and robust architecture for accessing information. We suggest that the technical challenges have been largely overcome, the regulatory challenges are becoming clearer thanks to work performed in public–private partnerships such as WEB-RADR,12,13 but the ethical challenges remain. Additionally, it is not known whether current methods for signal detection on standard pharmacovigilance datasets, for example disproportionality analysis, are appropriate for data obtained through social media.
It is clear that when it comes to social media there is a huge volume of information to be mined, but it is unstructured and primarily written using nonmedical terminology, slang or even pictograms (emojis or emoticons). Drugs can be described by various brand or generic names which may be misspelt or in some instances the product may have a nickname or ‘street’ name. All of this can make identification and attribution of possible AE reports even harder. In addition, the informal nature of social media posts can make it difficult to distinguish between AEs, potential benefits and underlying disease symptoms. This obfuscation has necessitated the concept of a ‘proto AE’ or a post that resembles an AE.14
Current approaches utilize natural language processing (NLP) to extract and classify information from social media sources to enable further processing for pharmacovigilance purposes.7,14,15 These algorithms are generally combined with a level of human ‘curation’ to produce a set of so-called ‘proto AEs’ for further analysis. NLP algorithms are being continually developed and one output from the WEB-RADR project has been the development of a ‘gold standard’ reference set that can be used to enhance the analytics of social media.16
Pharmacovigilance is a heavily regulated area and marketing authorization holders (MAHs) have perhaps been reluctant to look at social media as a data source for signal detection due to concerns that it would result in a large number of AE reports that require entry onto company safety databases and reporting to regulators. In Europe, the guideline on good pharmacovigilance practices module VI17 states that if a MAH becomes aware of a report of a suspected adverse reaction described in any noncompany-sponsored digital medium, the report should be assessed to determine whether it qualifies for submission as an individual case safety report (ICSR). Unsolicited cases of suspected adverse reactions from the internet or digital media should be handled as spontaneous reports. The same submission time frames as for spontaneous reports should be applied. This raises questions such as what constitutes a valid report for adverse reaction reporting from social media? Should MAHs attempt to perform follow up with users of social media for the purpose of ICSR reporting? What are the obligations for MAHs to screen social media for ICSR collection? The WEB-RADR project has attempted to address these questions in the output from Work Package 1.12
Assuming that the tools are available to screen social media posts and regulatory obligations are met, the question remains of whether, ethically, social media posts should be used for pharmacovigilance purposes. Patients are generally not actively ‘reporting’ what they have experienced, they are ‘talking’ to an online community or ‘asking questions’ of fellow patients. In such a situation do they have an understanding that ‘digital listening’ is taking place and that their conversations are being made available to large corporations for various purposes? Conversely, would patients have an expectation that the product manufacturer might be listening; indeed is that their expectation in the hope of some sort of action or reaction? Some of these points were raised in the early parts of the WEB-RADR project and have helped to shape the recommendations.12,18
Data from social media are obtained through third party providers in aggregate, usually stripped of individual identifiers which also limits the MAH ability to follow up for further information, determine if there is one or several patients experiencing an event, or provide further comment, context or information. This raises concerns in some quarters as to the expectations of a MAH if social media posts relate to serious, previously undetected events or issues. Under normal circumstances, a MAH would be expected to show due diligence in seeking further information from the reporter, but if the information has been found during digital listening or data mining then there are most likely no details available to follow up. WEB-RADR has tried to add guidance in this area with its recommendation that data-mining activities are classed as secondary use of data with no planned interactions with patients.12
Most pilot studies available in the published literature have relied on using proportional reporting ratio (PRR) as a method of signal detection in social media.7,15 While this method works with other datasets used in pharmacovigilance, its suitability in evaluating social media data is yet to be confirmed. Caster and colleagues19 reported using receiver operating characteristics (ROC) and maximum area under the ROC curve (AUC) for comparison of Vigibase and social media data. They concluded that disproportionality analysis on Facebook and Twitter data performed worse than global spontaneous reporting data, when benchmarked against historical label changes. However, they attributed this finding to the limited labelled data retrieval from social media rather than the disproportionality methods. This is clearly an area that merits further investigation and research.
Does the use of social media make a difference in pharmacovigilance obligations?
Given all the challenges MAHs face when evaluating social media signal detection, the central question remains: does it make a difference? In order to try and answer this, a number of groups have conducted pilot studies looking for signals in sets of social media data.7,14,15 The primary objectives of these studies are broadly similar; namely to determine if a set of aggregate social media data (Twitter, patient forums) generated by NLP and analysed using PRRs is not only suitable for safety surveillance but also valuable (for example, finding previously undetected signals or finding them sooner than traditional methods). To some extent the feasibility of the studies rested on whether there was sufficient volume of product mentions and proto AEs to allow for the application of quantitative methods for analysis. In addition, information from social media was often compared with ‘traditional’ data sources, such as the US Food and Drug Administration Adverse Event Reporting System (FAERS) database7,14 or World Health Organization Vigibase,20 to determine if the analysis of social media detects a safety signal not previously detected or found earlier than using traditional methods.
We conducted a 2-year (1 July 2014 to 30 June 2016), retrospective feasibility study under an observational research review group approved protocol to evaluate the contribution of social media data to safety signal detection. Data from publicly available sources (Twitter, patient and other online forums) were analysed using NLP and with quantitative methods (descriptive statistics and disproportionality). Under the observational research protocol, the use of the data was considered secondary as it was not generated with the objective of identifying safety signals nullifying ICSR reporting obligations. Additionally, any individuals interacting with the data did so only in aggregate with no access to individual post level data. Any findings were evaluated in aggregate and documented utilizing the company signal management process. Success criteria were defined as sufficient mentions of products to support further analyses, identification of any new safety signals, and faster identification of known signals than using traditional sources.
The social media data were analysed using Epidemico’s MedWatcher Social platform.21 English language posts were selected for inclusion if the post contained mention of a product under study (international nonpropriety name (INN) or trade name including misspellings and slang terms) and a Bayesian classifier removed duplicates and spam, and categorized the language in the post as an AE. The classifier used translates vernacular and colloquialisms into Medical Dictionary for Regulatory Activities (MedDRA) terminology and has been trained and improved by manual human curation. A post containing both a product mention and an AE was considered to be a proto AE and only aggregate data were provided to and analysed by Amgen.
Of the 15 marketed products that were queried in MedWatcher Social to assess the number of product mentions and proto AEs, four products were excluded from the study due to insufficient numbers of mentions to conduct a meaningful analysis. During the product selection, a weak positive correlation was observed between social media and cumulative postmarket exposure in patient years, as anticipated. As the analysis dataset was restricted to English language posts, it is not surprising that the overwhelming majority of posts originated from the United States, the United Kingdom, Canada and Australia (see Figure 1). Nearly all (96%) of the proto AEs were posted by patients, with only 4% posted by caregivers (Figure 2). These data support the hypothesis that social media is a useful source for obtaining direct patient experience with a medicinal product.
The social media proto AE data were compared with FAERS data from the same time period on the basis of counts stratified by both system organ class (SOC) and preferred term (PT), seriousness and listedness (Figure 3). The SOC profile for social media proto AEs and FAERS statistics of disproportionate reporting shows a striking difference in the nature of reported AEs (Figure 4). For example, the fact that ‘General disorders and administration site conditions’ is the most prevalent SOC in social media (45.77% of proto AEs) might suggest that patients are less specific in their description of events than healthcare professionals (HCPs), who report the majority of cases in FAERS. ‘Musculoskeletal and connective tissue disorders’ (12.48%) and ‘Gastrointestinal disorders’ (6.85%) are the next most prevalent SOC, suggesting patients are more likely to report events indicative of tolerability and symptomology than HCPs.
It might be surmised that patients are likely to report nonserious AEs and so the social media proto AEs were further compared with FAERS on the basis of seriousness using the European Medicines Agency Important Medical Events (IME) list (MedDRA 19.1) as a standard. The social media proto AEs were at most, on a per product basis, 13.58% serious (Figure 5). In comparison, the terms reported in FAERS were at least 28.2% serious for 9 of 10 products in the analysis dataset. This again suggests patients are reporting a fundamentally different type of event in social media than an HCP would report to a MAH or health authority.
A further analysis was performed to determine if social media data would allow for the earlier detection of signals than traditional data sources. Based on earlier studies, a statistic of disproportionate reporting threshold was used to identify signals from the social media proto AEs.14 A proto AE was considered a signal of disproportionate reporting (SDR) if it was unlisted per company core datasheet, the PRR exceeded 2.0 and was reported more than twice. While disproportionality algorithms are a useful prioritization tool for the hundreds of drug-event combinations in large datasets, the role of a human reviewer for providing proper clinical context and judgement for escalation to the signal management process cannot be eliminated. All proto AEs meeting disproportionality criteria were presented to product-specific physicians for review. No new safety signals were identified for further assessment. This review proved challenging for physicians as no post level data were available (by study design), only the number of mentions of the drug–event combination and disproportionality statistic. Even if post level data were available, the often short patient reported experience in social media may not be as useful in comparison to FAERS and Vigibase.
In order to assess the utility of social media for the early identification of safety signals, the date of the third social media report was considered the date of identification in social media for proto AEs meeting disproportionality criteria. As this study considered a 2-year interval of social media data, the third report in the study dataset may not be the third ever if any predated the interval. With this limitation in mind, proto AEs meeting disproportionality criteria with three or more reports in the study interval were compared against data from the Amgen signal management system. Five of six social media SDRs were identified sooner in traditional pharmacovigilance data sources (Figure 6). In the case of the sixth event, the associated product is a second-line therapy and the event is a known side effect of the primary treatment, therefore it cannot be suggested that social media post reporting would have led to earlier detection. This suggests routine surveillance of social media would not aid in the earlier identification of safety signals for assessment.
Overall, this pilot study did not demonstrate utility of performing routine signal detection on social media data. The number of product mentions showed variability and in some cases was not sufficient for further analysis. The events reported by patients in social media were significantly different than events reported in FAERS by HCPs with respect to the type of events and seriousness. Proto AEs meeting disproportionality criteria did not identify any new safety observations or signals following review by product physicians. While the temporal analysis was limited by the 2-year interval of study data, the results do not suggest signals would be identified earlier in social media than traditional sources. These results and other studies to date do not provide sufficient evidence to add social media to the routine sources of signal detection. Further work is required to identify the proper methodology for statistical signal detection in social media data considering the ever changing reporting trends and limited available information.
The results of our study have been comparable to the publically available results of pilot studies conducted by other groups. Earlier studies have also concluded the nature of events reported within social media are fundamentally different than other sources and may contain more product complaints, lack of effect and even benefits.7,15 Several pilot study publications have suggested social media may be used to evaluate targeted research questions in the real world, such as use in pregnancy, misuse, abuse and even benefits.22,23 Purrington24 reported the results of a pilot signal detection study in the forum PatientsLikeMe. Again this study concluded that the patient-reported events were generally nonserious and were focused on quality of life terms, and labelled accordingly. The author suggested the small amount of data in this specific forum would limit its utility as a standalone signal detection source, but it would supplement other routine activities.
Bhattacharya and colleagues7 conducted a study using 26 months of social media data for six products. They reported that analysis of social media data did not identify new or previously identified signals. The authors suggested this may be due to low numbers of reports in the social media dataset examined combined with difficulty in coding social media posts with standard medical dictionaries such as MedDRA. In our own study we also found that there was variability in volume of proto AEs between products studied and that the volume of events from social media was substantially smaller than the reporting volume from the company safety database or FAERS. Even larger studies, such as the Innovative Medicines Initiative WEB-RADR project, struggled to have comparative records for review. WEB-RADR included 38 products across multiple manufacturers and was able to assess a dataset of 40,000 tweets reporting 56,000 product event combinations which was compared to the 613,000 product event combinations in the WHO Vigibase. However, no matter what size the dataset analysed, all groups appeared to come to a fairly consistent conclusion that, in general, data mining in social media datasets does not reveal new safety signals or observations.
Carrie and colleagues14 compared the first post of an AE in social media to signals detected in FAERS for 10 FDA post-marketing products. They identified only one report that occurred in social media prior to signal detection in FAERS, leading them to conclude that social media monitoring may provide earlier insights into certain AEs. Duh and colleagues8 also reported finding an earlier post of an AE in social media compared with FAERS for an AE of one of the two drugs studied by them. However, one may argue that identification of a single proto AE post is not equivalent to a statistical signal being identified. Both quantitative methods would require human review in a clinical context for further evaluation.
Large social media datasets may however contain data-rich areas where certain conditions, preferred terms or products are over represented. For example, in the WEB-RADR project drug tolerance and dependence were seen to be particularly data-rich areas for further assessment. This finding was also reported by Bhattacharya and colleagues,7 who found that the social media data did provide insights into areas such as medication tolerability, adherence to treatment and quality of life improvements. The authors noted that these patient insights are often lacking in ‘traditional’ AE reports, which frequently are produced by healthcare providers. It is clear that further refining the NLP algorithms used may further enrich these seams of data which could add to the utility of social media in the pharmacovigilance space.
So is social media a reliable tool for safety signal detection?
Our objective was to determine whether social media is a reliable data source for safety signal detection. This curiosity was driven by the idea that social media represents a potentially huge untapped data source of direct patient experiences. Whilst social media use in society is ever increasing, the data that are currently available for pharmacovigilance are relatively limited (Twitter, some patient forums) due to access and privacy considerations. In addition, the unstructured nature of social media posts, together with use of informal and nonmedical language make the data that are available difficult to mine in a systematic way. With these caveats in mind it is perhaps not surprising that our pilot study, along with WEB-RADR and other research to date, do not provide sufficient evidence that signal detection using social media can detect signals not found in other datasets or earlier than traditional methods.
However, social media are not without utility in pharmacovigilance. Studies have shown that patients’ posts may be used to gain better insight into how medicines are used in real life, including how they are misused. They also give us a window into the patients’ overall experience with their medication and its impact on their quality of life. Whilst this is not always easily captured in a structured dictionary, it is clear that certain ‘data-rich’ areas are worth further study. These biases in the data also impact the statistical tools traditionally applied to well structured databases to determine if an AE term is being reported disproportionately. Hence more robust statistical signal detection methods may need to be established to accommodate the volatility of data sources in social media. In addition, dictionaries more able to capture informal nonmedical terms need further development to allow meaningful data capture and analysis.
So is social media monitoring a reliable tool for signal detection? We would conclude that further work is necessary before it can be seen as a standalone source for routine signal detection. Until then, there may be some areas where information present in social media could be valuable to augment traditional data sources or in signal evaluation.
Acknowledgments
The authors would like to acknowledge Alice Hsu for her support in our study. We also acknowledge the help from colleagues in the Amgen Global Patient Safety Therapy Area team.
Footnotes
Funding: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Conflict of interest statement: Sue Rees is an employee of Amgen Ltd and Sadiqa Mian and Neal Grabowski are employees of Amgen Inc. All authors hold stock in Amgen Inc.
Contributor Information
Sue Rees, Global Patient Safety & Labelling, Amgen Ltd, 240 Cambridge Science Park, Milton Road, Cambridge, CB4 0WD, UK.
Sadiqa Mian, Global Patient Safety & Labelling, Amgen Inc. Thousand Oaks, California USA.
Neal Grabowski, Global Patient Safety & Labelling, Amgen Inc. Thousand Oaks, California USA.
References
- 1. Almenoff J, Tonning JM, Gould AL, et al. Perspectives on the use of data mining in pharmaco-vigilance. Drug Safety 2005; 28: 981–1007. [DOI] [PubMed] [Google Scholar]
- 2. Dictionary OEL. https://en.oxforddictionaries.com/definition/social_media
- 3. Social WA. Three billion people are now using social media, Hootesuite, 2017. https://wearesocial.com/uk/blog/2017/08/three-billion-people-now-use-social-media.
- 4. De Martino I, D’Apolito R, McLawhorn AS, et al. Social media for patients: benefits and drawbacks. Curr Rev Musculoskelet Med 2017; 10: 141–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Fehring KA, De Martino I, McLawhorn AS, et al. Social media: physicians-to-physicians education and communication. Curr Rev Musculoskelet Med 2017; 10: 275–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Edwards IR, Lindquist M. Social media and networks in pharmacovigilance: boon or bane? Drug Safety 2011; 34: 267–271. [DOI] [PubMed] [Google Scholar]
- 7. Bhattacharya M, Snyder S, Malin M, et al. Using social media data in routine pharmacovigilance: a pilot study to identify safety signals and patient perspectives. Pharmaceut Med 2017; 31: 167–174. [Google Scholar]
- 8. Duh MS, Cremieux P, Audenrode MV, et al. Can social media data lead to earlier detection of drug-related adverse events? Pharmacoepidemiol Drug Saf 2016; 25: 1425–1433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Tricco AC, Zarin W, Lillie E, et al. Utility of social media and crowd-sourced data for pharmacovigilance: a scoping review protocol. BMJ Open 2017; 7: e013474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Sloane R, Osanlou O, Lewis D, et al. Social media and pharmacovigilance: a review of the opportunities and challenges. Br J Clin Pharmacol 2015; 80: 910–920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Bousquet C. The adverse drug reactions from patient reports in social media project: five major challenges to overcome to operationalize analysis and efficiently support pharmacovigilance process. JMIR Res Protoc 2017; 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Brosch S. Implications of WEB-RADR for pharmacovigilance. In: International Society of Pharmacovigilance (ISoP) 17th annual meeting, Liverpool, 2017. [Google Scholar]
- 13. Lengsavath M, Pra AD, Ferran A-Md, et al. Social media monitoring and adverse drug reaction reporting in pharmacovigilance: an overview of the regulatory landscape. Ther Innov Regul Sci 2017; 51: 125–131. [DOI] [PubMed] [Google Scholar]
- 14. Pierce CE, Bouri K, Pamer C, et al. Evaluation of Facebook and Twitter monitoring to detect safety signals for medical products: an analysis of recent FDA safety alerts. Drug Safety 2017; 40: 317–331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Powell GE, Seifert HA, Reblin T, et al. Social media listening for routine post-marketing safety surveillance. Drug Safety 2016; 39: 443–454. [DOI] [PubMed] [Google Scholar]
- 16. Tim A, Casperson JLP, Juergen D. Strategies for distributed curation of social media data for safety and pharmacovigilance. In: International conference on data mining CSREA Press, 2016, pp.118–124. [Google Scholar]
- 17. European Medicine Agency. Guideline on good pharmacovigilance practices (GVP) Module VI – Collection, management and submission of reports of suspected adverse reactions to medicinal products (Rev 2). In: European Medicines Agency HoMA, (ed.). Rev 2nd ed. 2017. [Google Scholar]
- 18. Sukkar E. Searching social networks to detect adverse reactions. Pharm J 2015; 294. [Google Scholar]
- 19. Caster OLM, Vroman B, Van Stekelenborg J. Performance of disproportionality analysis for statistical signal detection in social media data (Abstract). Pharmacoepidemiol Drug Saf 2016; 25: 411. [Google Scholar]
- 20. Tregunno P. Harnessing mobile apps and social media for product safety. In: Tenth stakeholder forum on the pharmacovigilance legislation. London: EMA, 2016. [Google Scholar]
- 21. Epidemico. MedWatcher Social [computer software]. Boston, MA. [Google Scholar]
- 22. Powell GEDS, Bell HG, Anderson LS, et al. In their own words: social listening for ‘real- world benefits’ from prescription and OTC products. In: ISPOR 20th Annual International Meeting Philadelphia, PA, 2015. [Google Scholar]
- 23. Sarker A, O’Connor K, Ginn R, et al. Social media mining for toxicovigilance: automatic monitoring of prescription medication abuse from Twitter. Drug Safety 2016; 39: 231–240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Purrington A. Utilization of social media for postmarketing surveillance: proof of concept study with PatientsLikeMe data for signal detection. In: DIA 2015 51st annual meeting: develop innovate advance Washington, DC, 2015. [Google Scholar]