Abstract
Personal health messages – inter patient communications within online communities; represent a new path towards providing continuous information about patient derived health status. We apply natural language processing techniques to personal health messages from online message boards to demonstrate the ability to track trends in people’s positive or negative opinion (sentiment) regarding particular drugs over time. The significant changes in sentiment correspond to FDA announcements and other publicity. We envision such analysis as a scalable tool for pharmacovigilance hypothesis generation for possible adverse drug reactions.
Measuring Population Health
Measuring population health requires recording health status for large numbers of people. Status of chronic conditions requires a level of detail beyond current medical records.1 Personal health messages are a widespread medium that provide medical information in natural language from patients. Previous work developed visualizations to aid physicians and patients within online communities to discover information about commonly used drugs, combinations, and adverse drug reactions.4,5
Electronic medical records (EMR) are capable of detecting serious adverse drug reactions such as Myocardial infarction.2 However, personal health messages present more detailed and timely information from patients that otherwise take longer to surface in EMRs due to time lags between visits with physicians, especially for non-life threatening adverse drug reactions. Drug surveillance is a large component of drug safety once a drug has been released to the public. Drug trials are too small to detect uncommon or rare adverse effects, often numbering in less than 2,000 test subjects.11 Patients or volunteers participating in drug trials are also different from those receiving licensed medications, differing in age, co-morbidity and polypharmacy.11 All of these reasons, including the continued monitoring of drugs for possible long term effects necessitate pharmacovigilance; studying the safety of marketed drugs on an epidemiological scale.
We present a method of public health surveillance, a technique used within pharmacovigilance, to aid in the identification of candidate drugs that potentially cause adverse drug reactions. We focus on new methods for exploring and investigating patient derived health data. It is important to understand how the general populace use a particular drug, perceive its’ safety, reactions, and efficacy. One technique used to measure the effectiveness of a drug is quantifying the effects that it produces. Sentiment analysis, determining the positive or negative valence, is another way in which we can determine drug satisfaction. A drug may have many serious side effects, yet people may still have a positive attitude towards it, especially if they believe it helps them in some way.16
We apply a technique for automated sentiment analysis from the computer science and natural language processing field. Opinion mining and sentiment analysis is a commonly accepted area within natural language processing. It is commonly used to aggregate and evaluate product reviews. In two surveys of more than 2000 American adults, 81% of Internet users (60% of Americans) have done online research on a product at least once, 20% (15% of Americans) do it on a daily basis. In this case we consider analogous to other products, drugs like many other products consist of name brands with marketing campaigns. We value others opinions in our own decision making processes.12 Widespread dissatisfaction with a drug, is not only alarming to others potentially taking the drug but also to oversight organizations such as the FDA. Widespread unease may be indicative of problems associated with the drug, ranging from adverse effects to concerns over pricing or efficacy; it can indicate to others that further investigation into the drug is warranted.
Sentiment analysis within this context is performed automatically by computers allowing it to rapidly analyze large numbers of reviews and responses to drugs. While sentiment analysis is performed automatically by computer, reviews about specific drugs are necessary. Several health and drug review sites exist such as PatientsLikeMe.com, however, such sites are nascent and lack the number of people and reviews to provide large population surveillance. Our method uses natural language processing and as such can utilize and aggregate information from disparate sources, information that may be gathered across the web.
Methods Using Personal Health Messages
We focus on deriving data from personal health messages found in social forums. Yahoo Groups consists of an estimated 1 billion messages from 100 million users. The social forums we are currently looking at consist of 27,290 public Yahoo Health groups. Within these groups there are a total 12,519,807 total messages. These groups range from illness based support groups such as ones focusing on Multiple Sclerosis to groups focusing on herbal home remedies. The messages within these groups span seven years and consist of hundreds of thousands of unique email addresses which we consider as a proxy for people. Many more public and private health groups exist, providing a plethora of data consisting of numerous people we see as a potential source of data for population monitoring of drugs.
Within these forums, patients seek others’ support, advice and information about treatment options. We see the messages within these forums as unstructured diaries detailing a patient’s condition and treatment experiences. Often these responses detail reactions both positive and negative to specific medications. Below is an excerpt from a message from a neurological group that has been anonymized.
…when I gave up coffee and sugar in earnest and stopped the amitrptyline I was taking I am feeling much better still especially depression wise and the heavyness and sluggishnes that was in my legs is leaving. I am also getting back into a more normal sleep pattern of getting sleepy by normal time in evening and waking up in the morning more normally. I believe the years of amitriptyline for muscle relaxant were doing more damage than good and am doing pretty good controlling my night time bladder spasms by no coffee, magnesium and the diet…
We interpret the detailed responses that a patient writes as a “review.” Traditional sentiment analysis techniques are applied to these texts.
Our method utilizes portions of the lexicon in the Linguistic Inquiry and Word Count (LIWC)14 when calculating sentiment scores for messages. Previous work utilized LIWC to demonstrate variations in language usage between depressed and depression-vulnerable students.15 Further, it is known that the words people use correlates with their physical and mental health.13
Specifically we use the words in LIWC corresponding to the following categories: positive emotion, negative emotion, anxiety, anger and sadness. We have augmented the LIWC lexicon to include a wide range of emoticons such as :) :(:P ^__^ LOL ROFL.
The messages from the Yahoo groups were parsed to extract just the textual information and to remove noise such as replies that are often included in messages. While replies help understand the context, a message’s emotional context should not be based on what other people write, only on the authors’ text.
The resulting messages were matched against the LIWC lexicon categories and emoticons discussed previously. Counts containing number of positive emotion words, and negative ones, and total number of words were recorded. It was found that the ratio of negative emotion words to total words was the most helpful in determining negative valence. The following methods and results use a negative ratio of negative emotion words to total words in a message so that when graphing results, the positive y-axis represents less negative messages.
Experimental Results
Our examples below are representative of what can be automatically extracted with simple techniques from health messages. The first experiment is a narrow focus group, a message group for persons with multiple sclerosis, a degenerative muscle disease. Here, approximately 500 people posted some 20K messages. We examine the timeline of the introduction of a unique treatment, the drug Tysabri, which was approved then subsequently removed from the market then re-approved by the FDA.
The second experiment is a broad focus group, concentrating on pain relievers. We focus on two drugs Vioxx and Celebrex, which were subject to FDA warnings and withdrawals. These drugs provide a contrast in groups and illness types. Tysabri is used to treat a chronic life threatening/debilitating disease. It has a narrow purpose and few alternatives. Vioxx and Celebrex are broad use, broad class of drugs often used to treat chronic but non-life threatening illness, of which there are many alternative medications including over the counter ones. From 1999 to 2005, 106.62 million and 157.36 million prescriptions were dispensed in the United States for Vioxx and Celebrex, respectively.2 These numbers are a far cry from the 2000 test subjects that undergo FDA drug trials.
Tysabri
Tysabri is a recently introduced prescription medication approved for patients with relapsing forms of Multiple Sclerosis (MS). It was originally approved by the FDA in November 2004, and then was subsequently withdrawn by the manufacturer, Biogen-Idec in February 2005. In June 2006 it was then approved for resumed marketing.7 The use of this drug is narrow, specifically for MS.
We demonstrate the ability to track changes in sentiment within a specific group for a limited use drug. Two MS groups that contained more than 500 instances of Tysabri were selected. The messages were evaluated using our augmented LIWC lexicon. A one way ANOVA was run to determine if there was a statistically significant difference between the scores of messages in the following groups: messages containing Tysabri references pre-recall, during recall, and after the recall, and messages not containing Tysabri references pre-recall, during recall, and after the recall. We found that the results statistically significant with p < .001. We plotted the outlier corrected means of the messages containing Tysabri references versus control (messages with no Tysabri references). See Figure 1 for this graph. Data was binned by quarters to improve the number of samples per a data point.
Tysabri first appears shortly after it was approved by the FDA in the November/December timeframe as seen by the first data point. After its’ introduction the sentiment initially improved then got increasingly negative during the time period Biogen-Idec removed it from the market. After the drug’s re-introduction to the market the sentiment was extremely positive compared to the time period before and during its’ recall. Now that the drug was back on the market, the sentiment has seemed to stabilize at a more slightly negative point than at its’ reintroduction. We conjecture that the highly positive sentiment was due to people’s hope for the drug but the actual effect on the population lagged later and can be seen by the stabilized sentiment.
Vioxx
We looked for groups that contained 100 more instances of Vioxx or Celebrex. Our data was spread over more than 25,000 newsgroups spanning 7 years. This selection process was used to weed out newsgroups that did not contain any references to either of the drugs and ensure there were similar numbers of drug messages as compared to control ones. We only consider the sentiment of control messages of groups that talk about either of the drugs. The pruning resulted in 40 groups, 681,516 people (we consider an email address a proxy for a person), containing 867,659 messages of which had an average message length of 1,513 characters. We were interested in determining if our system was able to accurately determine affect for large groups as well as for a broad coverage drugs. Vioxx and Celebrex are commonly used pain relievers. Vioxx was first marketed in 1999 and subsequently withdrawn in 2004.8 We tracked the sentiment of these two drugs over time. See Figure 2 for data points and dates.
The messages were evaluated using our augmented LIWC lexicon. A one way ANOVA was performed to determine if there was a statistically significant difference between the scores of messages in the following groups: messages containing Vioxx pre and post recall; messages containing Celebrex pre and post recall; and messages not Vioxx or Celebrex pre and post recall. The results are statistically significant with p < .001. The outlier corrected message means containing Vioxx references, Celebrex and control (messages with no Vioxx or Celebrex references) are plotted in Figure 2. Data is binned by quarters to improve number of samples.
Sentiment sharply increases for Celebrex and similarly decreases for Vioxx. On October 30, 2003 the Wall Street Journal published an article from a medical conference that was funded by Merck, the maker of Vioxx, which found “an increased risk of heart attack…compared with patients taking a competing painkiller, Celebrex, from Pfizer Inc.”3
The Celebrex sentiment graph increases again and decreases for Vioxx. On April 14, 2004, in study sponsored by Merck, researchers found an elevated risk of acute myocardial infarction associated with Vioxx but not with Celebrex.17 A similar study was published in May 2004 showing that rates of admission for congestive heart failure were higher for patients on Vioxx and non-selective NSAIDs.10
On September 30, 2004 Merck voluntarily pulled Vioxx from the market.8 Intuitively one would expect that sentiment towards the drug decreases, however one can imagine that sentiment might increase due to the relief people would have since they are no longer on the drug and that it is no longer available.
Sentiment for Celebrex declined, whereas Vioxx sentiment looks somewhat static. On December 17, 2004 the FDA released a statement that the National Cancer Institute and Pfizer had stopped a clinical trial for Celebrex after determining: “Patients in the clinical trial taking 400 mg. of Celebrex twice daily had a 3.4 times greater risk of CV [cardiovascular] events compared to placebo. For patients taking 200 mg. of Celebrex, the risk was 2.5 times greater.”9
Again, sentiment for Celebrex declined. On April 7, 2005 the FDA ordered Pfizer, the maker of Celebrex to remove a related drug Bextra.6 However, Celebrex was allowed to remain on the market with a boxed warning about potential cardiovascular events and life-threatening gastrointestinal bleeding.6
Conclusions and Future Work
We have built a coarse sentiment analyzer that demonstrates the ability to track trends in people’s sentiment regarding particular drugs over time. The changes in sentiment correspond to FDA announcements and other publicity. It is well understood that the media is able to influence the public’s opinion and we demonstrate that sentiment analysis is sensitive enough to detect news-correlated changes. The sentiment difference between drug and non-drug containing messages are statistically significant. Overall negative sentiment towards a particular drug or changes in trend may indicate possible dissatisfaction with particular drugs and warrant further investigation into causes of dissatisfaction, including adverse drug reactions.
Previous work in other domains suggests that a better approach is to use machine learning based approaches to train classifiers on sample messages, as judged by humans, and then evaluate the trained analyzer on different samples.12 We are currently in the process of generating training data and will shortly rerun the sentiment analysis. We are also developing better quality filters for these messages, to filter out noise.
The methodology we present is a first step, demonstrating the ability to track change in sentiment over time. Work in determining the causation for the changes in sentiment over time still remains. We are currently implementing more sophisticated natural language processing such as named entity recognition to identify drugs and drug outcomes. The same methodology will be applied to health messages to isolate patients taking Vioxx and Tysabri to track their sentiment and drug outcomes leading to automatic methods of adverse drug reaction prediction. Our goal is to develop population health measurement tools to predict treatment outcomes.
Acknowledgments
Partial funding was provided by NSF FIBR grant 0425852 and by a finalist grant from the CIMIT Prize for Primary Healthcare. Eric Gilbert provided guidance on the sentiment analysis and training.
References
- 1.Berlin R, Schatz B.Internet Health Monitors for Outcomes of Chronic Illness Medscape General Medicine, 6 sections (12pp), Sept 2, 1999 [PubMed]
- 2.Brownstein JS, Sordo M, Kohane IS, Mandl KD. The tell-tale heart: Population-based surveillance reveals an association of rofecoxib and celecoxib with myocardial infarction. PLoS ONE. 2007;2(9):e840. doi: 10.1371/journal.pone.0000840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Burton TM, Callahan P. Vioxx study sees heart-attack risk: Merck funded research after concerns were raised about its painkilling drug. Wall Street Journal. 2003 Oct 30;:B1–2. [Google Scholar]
- 4.Chee B, Berlin R, Schatz B.Information Visualization of Drug Regimens from Health MessagesHEALTHINF 2009. Porto Portugal, January 2009.
- 5.Chee B, Karahalios KG, Schatz B.Social visualization of health messages. HICSS ‘09 Waikaloa, HI, January 2009.
- 6.FDA Important changes and additional warnings for cox-2 selective and non-selective non-steroidal anti-inflammatory drugs (nsaids) http://www.fda.gov/CDER/Drug/advisory/COX2.htm 2005. accessed March 12, 2009.
- 7.FDA Approves resume marketing of tysabri under a special distribution program http://www.fda.gov/bbs/topics/NEWS/2006/NEW01380.html 2006. Accessed March 12, 2009.
- 8.FDA Public health advisory on vioxx as its manufacturer voluntarily withdraws the product. http://www.fda.gov/bbs/topics/news/2004/new01122.html 2004. Accessed March 12, 2009.
- 9.FDA Statement on the halting of a clinical trial of the cox-2 inhibitor celebrex http://www.fda.gov/bbs/topics/news/2004/NEW01144.html 2004. Accessed March 12, 2009.
- 10.Mamdani M, Juurlink DN, Lee DS, Rochon PA, Kopp A, Naglie G, et al. Cyclo-oxygenase-2 inhibitors versus non-selective non-steroidal anti-inflammatory drugs and congestive heart failure outcomes in elderly patients: a population-based cohort study. The Lancet. 2004;363(9423):1751–1756. doi: 10.1016/S0140-6736(04)16299-5. [DOI] [PubMed] [Google Scholar]
- 11.Mann RD, Andrews EB. Introduction. In: Mann RD, Andrews EB, editors. Pharmacovigilance. West Sussex: John Wiley & Sons; 2002. pp. 3–10. [Google Scholar]
- 12.Pang B, Lee L. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval. 2008;2(1–2):1–135. [Google Scholar]
- 13.Pennebaker JW, Campbell RS. The effects of writing about traumatic experience. Clinical Quarterly. 2000;9:17–21. [Google Scholar]
- 14.Pennebaker JW, Francis ME, Booth RJ. Lawrence Erlbaum Assoc; New Jersey: 2007. Linguistic inquiry and word count: LIWC 2007. [Google Scholar]
- 15.Rude SS, Gortner E-M, Pennebaker JW. Language use of depressed and depression-vulnerable college students. Cognition and Emotion. 2004;18(8):1121–1133. [Google Scholar]
- 16.Silver M. Success with heart failure. Perseus Publishing; Cambridge MA: 2002. [Google Scholar]
- 17.Solomon DH, Schneeweiss S, Glynn RJ, Kiyota Y, Levin R, Mogun H, Avorn J. Relationship between selective cyclooxygenase-2 inhibitors and acute myocardial infarction in older adults. Circulation. 2004;109:2068–2073. doi: 10.1161/01.CIR.0000127578.21885.3E. [DOI] [PubMed] [Google Scholar]