Abstract
Online communities have been an integral part of tobacco cessation programs. They are rich in content, and offer insights into factors affecting an individual’s behavior change efforts. We used word representation techniques to infer implicit meaning embedded in messages exchanged in a health-related online community. Our analysis of peer interactions revealed that individuals factor in safety, glamour, expense, and media projection when choosing a form of nicotine intake. When choosing pharmacotherapy techniques, individuals focus on brands, dosage, and side effects associated with each form (e.g. gums, patches). Our analysis sheds light on factors embedded in peer interactions, which might lead to opinion formation based on peer influence and knowledge dissemination in these social platforms. Such understanding enables design of high-engagement behavior change technologies, through personalization of content delivery by factoring in individual-level beliefs, behavioral state, and community-level influences.
Keywords: Tobacco cessation, online communities, text analysis, temporal analysis
Introduction
Unhealthy behaviors such as tobacco use, physical inactivity, and poor nutrition are the leading causes of death worldwide [1]. Personalized behavioral interventions can help individuals abstain from unhealthy behaviors, and engage in healthier lifestyle choices [2]. In particular, interventions targeted at addictive behaviors such as smoking, should include both behavioral and pharmacological support [3]. Recent advances in use of social media platforms allow for peer-to-peer sharing of this information in an engaging format. Health-related online communities specifically designed for promoting healthy behaviors, provide an unprecedented opportunity to researchers to understand end user needs and fine-tune personalization of behavior change technologies in this digital era of healthcare [2]. Digital traces of peer interactions on these platforms provide a rich account of users’ behavioral state and social influences [4]. Application of advanced text analytics enables us to derive user-specific and content-specific tailoring strategies that can result in high levels of user engagement in behavior change tasks [5].
Recent research on health-related online social media platforms focus on the study of (1) communication structure to establish social influence, information diffusion [6,7], (2) communication content to validate social support, self-efficacy and other theoretical constructs in peer-to-peer communication [8], and (3) structure plus content to derive content-specific network influence patterns underlying peer interactions. Specifically, methods from distributional semantics, which involve projection of distributed representations of terms into a large dimensional vector space to learn semantic similarity between representations of words from unstructured data, in conjunction with machine learning classification to annotate content in online communities hold promise [9, 10].
In this study, our overarching aim is to understand user communication trends of factors related to tobacco use in QuitNet [11], which is one of the first online social communities for health behavior change, specifically smoking cessation. Our specific objectives are 1) to use representational terms to obtain a snapshot of peer interactions on topics related to tobacco use (pharmacotherapy and alternative nicotine intake modes) 2) to visualize the changes in the density of discussions related to these topics over the years from 2000 to 2015. Such analysis allows for understanding of psychosociobehavioral factors affecting individuals’ tobacco use. This may subsequently enable group-level and individual-level personalization of corresponding behavioral interventions. In the next sections of the paper, we describe the theoretical rationale of our work, our methodological underpinnings, and subsequent results.
1. Theoretical Background
Modes of nicotine intake and pharmacotherapy techniques, play an important role in promotion of tobacco cessation [3]. Understanding individuals’ perception of these two topics is vital to design personalized interventions, which in general are perceived favorably by individuals [2].
While there has been significant decline in cigarette smoking, the use of emerging tobacco products such as cigars, e-cigarettes, and hookah has increased. Users’ preference of one method over another can be attributed to perceived benefits and preferences [1]. On the other hand, studies have shown that choosing the form of nicotine replacement therapy based on the individual’s needs and preferences may improve abstinence rates [12]. Around the world, there are drugs that are approved for safe use, which include Nicotine Replacement Therapy (NRT), patch, gum, nasal spray, inhaler, and lozenges. Negative side-effects of such treatment can also affect users’ perception and choice [2].
In our study, we aim to understand community level shifts across a range of topics related to nicotine intake modes and pharmacotherapy over 15 years. In order to gain these insights, we employ word representation techniques that are manually less intensive and enable information retrieval based on implicit semantic relationships.
2. Materials and Methods
QuitNet, an online community promoting smoking cessation, has been in continuous existence for the past 16 years and has over 100,000 registrants per year from across various countries. Studies have shown that participation in QuitNet is strongly correlated with abstinence [17]. In this study, we used the publicly available forum messages from QuitNet, which are the primary mode of communication. We used 2,467,550 messages exchanged among 97790 unique users from 2000–2015. The mean age of users on QuitNet is 41 years. Consistently throughout the years, around 65% of the QuitNet community users were female.
First, the messages were divided based on the year in which they were exchanged. As a next step, representational terms for each of the topics were identified from literature [18–19]. The topics and the corresponding representational terms are shown in Table 1.
Table 1.
Topics of communication and the representational terms across each topic
| Topic | Representational Terms |
|---|---|
| Modes of nicotine intake | E-cigarettes, snus, chew, cigars, hookah, kreteks, pipe |
| Pharmacotherapy | NRT, injection, gums, patch, lozenges |
Word representation techniques were used to identify implicit meaning in communication. A specific distributional method, known as neural word embedding was used, and was implemented using the Skipgram with Negative Sampling (SGNS) algorithm [20] as implemented in an open source package called Semantic Vectors [21]. The word embedding helps identify nearest neighbors to representational terms based on their relatedness by representing words as vectors of continuous values, permitting the estimation of distance between them. Wikipedia was used as a background corpus to provide sufficient semantic context as QuitNet messages are short and terse. Wikipedia corpus contains 1.9 billion words in more than 4.4 million articles. We derived word embeddings from the Wikipedia corpus to obtain Wikipedia term vectors. Wiki-based QuitNet message vector representations were then obtained by creating vector representations for each QuitNet message as the normalized sum of the Wikipedia term vectors. We then obtained QuitNet term vectors by adding QuitNet message vectors for each term that occurred in QuitNet, and normalizing the resulting vector. This procedure is illustrated in Figure 1. We then used the topic specific representational terms (see Table 1) to obtain the nearest neighboring messages. In order to estimate the accuracy of the retrieved messages, the precision (relevant messages retrieved/ total messages retrieved) of the information retrieval system was calculated.
Figure 1.

Vector generation process for obtaining nearest neighbors
In order to observe trends in topics across years, the top 10 nearest neighbors (nearest messages) of each representational term were manually analyzed message-by-message. We used Semantic Vectors [21] package to rank the retrieved messages in the order of their semantic similarity to the search terms. The semantic similarity score is calculated using the cosine similarity between the representational term and the relevant messages. The Z-scores were then calculated using sum of semantic similarity scores of related neighboring messages, derived for each term, which were normalized for unique users across each year. Z-score gave an estimate of the semantic similarity score’s relationship to the mean, subsequently allowing us to observe the fluctuations of specific topics of interest, as embedded in QuitNet users’ communication across years.
3. Results and Discussion
The average precision across years for all representational terms for “modes of nicotine intake” is 0.64 and the average precision across years for all representational terms for the topic “pharmacotherapy” is 0.76. The precision of the information retrieval system specific to the representational term in each topic across years is given in Table 2.
Table 2.
Precision of representational term across years
| Topic | Representational Term | Precision |
|---|---|---|
| Modes of nicotine intake | E-cigarettes | 0.64 |
| Snus | 0.69 | |
| Chew | 0.59 | |
| Cigars | 0.49 | |
| Hookah | 0.72 | |
| Kreteks | 0.68 | |
| Pipe | 0.7 | |
| Pharmacotherapy | NRT | 0.77 |
| Injection | 0.62 | |
| Lozenges | 0.82 | |
| Patch | 0.81 | |
| Gum | 0.75 |
With respect to modes of nicotine intake, from the communication of users, we can say that most common methods were cigars and pipes. Hookah as a method of nicotine intake was first mentioned in QuitNet popularly in the year 2003. Similarly, snus, Kreteks, and e-cigarettes were first mentioned in the year 2004, 2005, and 2009 respectively. From the z-scores shown in Figure 2, it can be seen that the topic swings back and forth through the years until 2009. After 2010, there is a high rise in the mention of alternative modes of nicotine intake until 2015.
Figure 2.

Z-scores for the topics – Modes of nicotine intake and pharmacotherapy
In addition, granular qualitative analysis of the nearest neighbors revealed that users discuss FDA approval and side effects for new modes of nicotine intake. These details may later factor into their choice of a mode, which in turn may lead them to using what QuitNet users perceive as safer modes. Based on manual analysis of nearest neighboring messages, QuitNet users perceived hookah, e-cigarettes, and snus to be ‘safer’ modes. It was interesting to see that users’ choice was also based on the glamorous or exotic nature of a mode. In QuitNet, users associated these terms with cigars and pipes. Marketing and portrayal in the media may influence regular use of a mode [1]. Descriptive labels on tobacco products, such as “low-tar” or “light” may mislead people into perceiving it as a safer option [1], which QuitNet users mention in their messages.
All forms of pharmacotherapy such as NRT, gums, patches, and lozenges were popular among QuitNet users. The most mentioned brands among users are Wellbutrin, Zyban and Chantix. A multitude of physical/ biological consequences were mentioned such as sore throat, coughing, bloating, stuffiness, sneezing, sinuses, allergies, bleeding, itching, infections, and nausea. As seen in Figure 2, there was an initial negative trend in the mention of the topic in 2000, after which, there is a huge up-rise in the mention of the pharmacotherapy options from 2010–2015.
In summary, we have analyzed, measured, and visualized trends in peer-to-peer communication through use of representational terms of topics. The results have given insight into user perceptions pertaining to smoking cessation, specifically “modes of nicotine intake” and “pharmacotherapy”, which can lead to community-level content refinement. For example, we observed that users perceive hookahs are ‘safer’ modes of nicotine intake. Users could be presented with information from research articles and/or their peers’ experience that can help them understand that there is no ‘safe’ mode of nicotine intake. Also, users who mention side-effects other than the commonly reported ones can be directed to an expert-moderated forum, which might lead the individual to choose alternative methods of treatment and eventually, to a safe and effective quit process.
4. Limitations and Future Work
This study is an effort to establish the use of word embeddings for exploratory understanding of a behavioral topic, in our case modes of nicotine intake and pharmacotherapy techniques. While the analysis presented in this paper is conducted at a community level, future studies may benefit from nuanced analysis of user-level (e.g. gender/age level differences in perceptions and choices) and behavior-level (e.g. pharmacological support discussions exchanged among relapsers) communication trends. The accuracy of the system can be improved in the future through infusion of sophisticated machine learning techniques (e.g. convoluted neural networks) into the word embeddings pipeline.
5. Conclusion
This study focuses on studying topics related to tobacco use as manifested in peer-to-peer communication among users of an online community. Further, our study quantifies the communication density of the specific topics of interest. We use an exploratory approach using representational terms to retrieve major topics related to nictoine use and pharmacotherapy. In order to accomplish this, we have used word embedding techniques to identify implicit meaning in communication, thus leading to a broader set of representational terms to account for vocabulary specific to QuitNet users. Although we have used word representational techniques at a community level, these methods can be also be used at a more granular level, investigating traits of specific user groups and individual users.
Studying the modes of nicotine intake and pharmacotherapy techniques offers insight into interventional markers for tobacco cessation. Appropriate behavior modification techniques can be introduced based on the need of the hour. The methods when applied to user groups or individuals can help us identify their specific needs and in turn allow for more personalized interventions in the form of recommendation engines that allow for content refinement and suggestion of evidence-based educational materials.
Acknowledgments
Research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under Award Number 1R21LM012271-01. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
References
- 1.World Health Organization. Report on the global tobacco epidemic, 2015: Warning about the dangers of tobacco. Geneva, Switzerland: World Health Organization; 2015. [Google Scholar]
- 2.Strecher VJ, Shiffman S, West R. Randomized controlled trial of a web-based computer-tailored smoking cessation program as a supplement to nicotine patch therapy. Addiction. 2005;100:682–8. doi: 10.1111/j.1360-0443.2005.01093.x. [DOI] [PubMed] [Google Scholar]
- 3.Roberts NJ, Kerr SM, Smith SM. Behavioral interventions associated with smoking cessation in the treatment of tobacco use. Health Services Insights. 2013;6:79–85. doi: 10.4137/HSI.S11092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zhang M. PhD Thesis. Philadelphia: Drexel University, Pennsylvania; 2015. Social media analytics of smoking cessation intervention: user behavior analysis, classification, and prediction. [Google Scholar]
- 5.Brug J, Oenema A, Campbell M. Past, present and future of computer-tailored nutrition education. The American Journal of Clinical Nutrition. 2003;77:1028–34S. doi: 10.1093/ajcn/77.4.1028S. [DOI] [PubMed] [Google Scholar]
- 6.Cobb NL, Graham AL, Abrams DB. Initial evaluation of a real-world internet smoking cessation system. American Journal of Public Health. 2010;100:1282–1289. doi: 10.2105/AJPH.2009.165449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Centola D. The spread of behavior in an online social network experiment. Science. 2010;329:1194–1197. doi: 10.1126/science.1185231. [DOI] [PubMed] [Google Scholar]
- 8.Graham AL, Papandonatos GD, Kang H, Moreno JL, Abrams DB. Development and validation of the online social support for smokers scale. J Med Internet Res. 2011;13:e69. doi: 10.2196/jmir.1801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Myneni S, Fujimoto K, Cobb N, Cohen T. Content-Driven Analysis of an Online Community for Smoking Cessation: Integration of Qualitative Techniques, Automated Text Analysis, and Affiliation Networks. American Journal of Public Health. 2015;105:1206–12. doi: 10.2105/AJPH.2014.302464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sridharan V, Cohen T, Cobb N, Myneni S. Characterization of temporal semantic shifts in peer-to-peer communication in a health-related online community: Implications for data-driven health promotion. AMIA Annual Symposium Proceedings; 2016; [PMC free article] [PubMed] [Google Scholar]
- 11.QuitNet LLC. [accessed 2016-10-28]; URL: https://quitnet.meyouhealth.com/ [WebCite Cache]
- 12.McClure JB, Swan GE. Tailoring nicotine replacement therapy rationale and potential approaches. CNS Drugs. 2006;20:281–291. doi: 10.2165/00023210-200620040-00002. [DOI] [PubMed] [Google Scholar]
- 13.Chen AT, Zhu S, Conway M. What online communities can tell us about electronic cigarettes and hookah use: A study using text mining and visualization techniques. J Med Internet Res. 2015;17:9, e220. doi: 10.2196/jmir.4517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Tighe PJ, Goldsmith RC, Gravenstein M, Bernard HR, Fillingim RB. The painful tweet: Text, sentiment, and community structure analyses of tweets pertaining to pain. J Med Internet Res. 2015;17:4, e84. doi: 10.2196/jmir.3769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Shahab L, West R. Differences in happiness between smokers, ex-smokers and never smokers: cross-sectional findings from a national household survey. Drug and Alcohol Dependence. 2012;121(1–2):38–44. doi: 10.1016/j.drugalcdep.2011.08.011. [DOI] [PubMed] [Google Scholar]
- 16.Cobb NK, Mays D, Graham AL. Sentiment Analysis to Determine the Impact of Online Messages on Smokers’ Choice to Use Varenicline. J Natl Cancer Inst Monogr. 2013;47:224–230. doi: 10.1093/jncimonographs/lgt020. [DOI] [PubMed] [Google Scholar]
- 17.Graham AL, Papandonatos GD, Erar B, Stanton CA. Use of an online smoking cessation community promotes abstinence: results of propensity score weighting. Health Psychology. 2015;34:1286–1295. doi: 10.1037/hea0000278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.World Health Organization. Tobacco: Deadly in Any Form or Disguise. Geneva: World Health Organization; 2006. [accessed 2016-11-09]. [Google Scholar]
- 19.Jiloha RC. Pharmacotherapy of smoking cessation. Indian Journal of Psychiatry. 2014;56:87–95. doi: 10.4103/0019-5545.124726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. NIPS’13 Proceedings of the 26th International Conference on Neural Information Processing Systems; pp. 3111–3119. [Google Scholar]
- 21.Widdows D, Cohen T. The semantic vectors package: new algorithms and public tools for distributional semantics. ICSC’10. Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing; pp. 9–15. [Google Scholar]
