Classification of Health-Related Social Media Posts: Evaluation of Post Content–Classifier Models and Analysis of User Demographics

Ryan Rivas; Shouq A Sadah; Yuhang Guo; Vagelis Hristidis

doi:10.2196/14952

. 2020 Apr 1;6(2):e14952. doi: 10.2196/14952

Classification of Health-Related Social Media Posts: Evaluation of Post Content–Classifier Models and Analysis of User Demographics

Ryan Rivas ^1,^✉, Shouq A Sadah ¹, Yuhang Guo ¹, Vagelis Hristidis ¹

Editor: Gunther Eysenbach

Reviewed by: Anis Davoudi, Jon-Patrick Allem

PMCID: PMC7160708 PMID: 32234706

Abstract

Background

The increasing volume of health-related social media activity, where users connect, collaborate, and engage, has increased the significance of analyzing how people use health-related social media.

Objective

The aim of this study was to classify the content (eg, posts that share experiences and seek support) of users who write health-related social media posts and study the effect of user demographics on post content.

Methods

We analyzed two different types of health-related social media: (1) health-related online forums—WebMD and DailyStrength—and (2) general online social networks—Twitter and Google+. We identified several categories of post content and built classifiers to automatically detect these categories. These classifiers were used to study the distribution of categories for various demographic groups.

Results

We achieved an accuracy of at least 84% and a balanced accuracy of at least 0.81 for half of the post content categories in our experiments. In addition, 70.04% (4741/6769) of posts by male WebMD users asked for advice, and male users’ WebMD posts were more likely to ask for medical advice than female users’ posts. The majority of posts on DailyStrength shared experiences, regardless of the gender, age group, or location of their authors. Furthermore, health-related posts on Twitter and Google+ were used to share experiences less frequently than posts on WebMD and DailyStrength.

Conclusions

We studied and analyzed the content of health-related social media posts. Our results can guide health advocates and researchers to better target patient populations based on the application type. Given a research question or an outreach goal, our results can be used to choose the best online forums to answer the question or disseminate a message.

Keywords: social media, demographics, classification

Introduction

Background

There is a huge amount of knowledge waiting to be extracted in health-related online social networks and forums, which we collectively refer to as social media. Health-related social media store the interactions of users who are interested in health-related topics [1]. These users share their experiences, share information of friends and family, or seek help for a wide range of health issues [1]. In the United States, more than 60 million Americans have read or collaborated in health 2.0 resources [2]. In addition, 40% of Americans have doubted a professional opinion when it conflicted with the opinions expressed in health-related social media [2]. Health-related social media widen access to health information for the public, regardless of individuals’ race, age, locality, or education [1].

In this study, we evaluated the content of posts in various health-related social media. We analyzed two types of health-related social media: (1) health-related online forums: WebMD and DailyStrength and (2) general social networks: Google+ and Twitter. This was a 4-step process comprising data collection, identifying post content categories, performing classification experiments, and performing a demographics analysis. We first collected large datasets of posts from each source and identified several categories. Afterward, we identified meaningful categories from randomly selected posts from each source. In our classification experiments, we labeled data from each source and trained classifiers to identify post content categories. Finally, we used classifiers trained on our labeled data to identify categories in the remaining data and analyzed how often posts in these categories are made by various demographic groups.

The goal of this study was to provide researchers with information and tools to support further research. For example, researchers looking for clinical trial participants can use DailyStrength, where users often share experiences about a particular condition, and health advocates seeking to spread awareness about a condition that affects men can use WebMD, where men often ask for advice. To this end, we also made comparisons between platforms to suggest where such a researcher might begin looking. The classifier models built in this study can assist with this task as well as other analyses involving health-related online postings.

Related Work

Analysis of Health-Related Social Media

Many studies have been performed to characterize health-related social media communities. Hackworth and Kunz [3] reported that 80% of Americans have searched the internet for health-related information, more than 60 million Americans are consumers of social networks in the Web 2.0 environment (health 2.0), and consumers, especially those with chronic conditions, are leading the health 2.0 movement by seeking clinical knowledge and emotional support. Wiley et al [4] studied the impact of different characteristics of various social media forums on drug-related content and demonstrated that the characteristics of a social media platform affect several aspects of discussion. Eichstaedt et al [5] predicted the county-level heart disease mortality by capturing the psychological characteristics of local communities through expressed text in Twitter. However, these studies do not describe or compare specific demographics in terms of their post content.

Further work has focused on categorizing health-related posts based on their content. Yu et al [6] performed a preliminary content analysis of D/deaf and hard of hearing discussion forum, AllDeaf, to observe different types of social support behaviors and identify social support features for a future text classification task. Reavley and Pilkington [7] analyzed the content of tweets related to depression and schizophrenia, finding that tweets about depression mostly discussed consumer resources and advertisements, whereas tweets about schizophrenia mostly raised awareness and reported research findings. Lee et al [8] analyzed the content of tweets from health-related Twitter users, finding that they tweet about testable claims and personal experiences. Lopes and Da Silva [9] collected posts from a health-related online forum, MedHelp, and used them to propose and refine a scheme for manually classifying health-related forum posts into 4 categories and a total of 23 subcategories. Our work was built upon these studies by defining our own categories of post content, some of which have analogues in these studies.

Health-Related Demographic Analysis

Other work has compared health issues between demographics or examined the demographics within a population participating in health-related research. Krueger et al [10] studied the mortality attributable to a low education level in the United States across several demographics, where they found people with an education level below a high school degree to have a higher mortality rate. Anderson-Bill et al [11] examined the demographics and behavioral and psychosocial characteristics of Web-health users (adults who use the Web to find information on health behavior and behavior change) recruited for a Web-based nutrition, physical activity, and weight gain prevention intervention. Their results suggest that users participating in online health interventions are likely “middle-aged, well-educated, upper middle-class women whose detrimental health behaviors put them at risk of obesity, heart disease, some cancers, and diabetes” [11]. These studies describe the demographics of the populations in their studies but do not describe the demographics of health-related social media users.

Previous work has focused on characterizing demographics on health-related social media. Sadah et al [12] analyzed the demographics of health-related social media and found that users of drug review websites and health-related online forums are predominantly women, health-related social media users are generally older than general social media users, black users are underrepresented in health-related social media, users in areas with better access to health care participate more in health-related social media, and the writing level of health-related social media users is lower than the reading level of the general population. Sadah et al [13] also performed a demographic-based content analysis of health-related social media posts to extract top distinctive terms, top drugs and disorders, sentiment, and emotion, finding that the most popular topic varied by demographic, for example, pregnancy was popular with female users, whereas cardiac problems, HIV, and back pain were the most discussed topics by male users. They also found that users with a higher writing level were less likely to express anger in their posts. We expanded upon this work by characterizing and comparing the demographics of health-related social media websites in terms of the frequency of post content categories.

Text Classification in Social Media

Text classification is frequently employed by researchers to gain insights into social media users and trends, both in and out of health-related settings. Sadilek et al [14] studied the spread of infectious diseases by analyzing Twitter data using a support vector machine (SVM) model. Huh et al [15] developed a naïve Bayes model to help WebMD moderators find posts they would likely respond to. Nikfarjam et al [16] proposed a machine learning–based tagger to extract adverse drug reactions from health-related social media. Mislove et al [17] estimated the gender and ethnicity of Twitter users using the reported first name and last name. Sadah et al [12] expanded upon the work of Mislove et al [17] by considering screen names in estimating gender. In this study, we used text classification techniques to identify categories of post content in health-related social media and used the techniques proposed in the studies by Sadah et al [12] and Mislove et al [17] to study the frequency of these categories within several demographics.

Methods

Datasets

For health-related online forums, we selected 2 different websites, WebMD and DailyStrength. The reason for selecting 2 health-related online forums is to cover the different types of health-related online forums that they each represent. Although WebMD consists of multiple health communities where people ask questions and get responses from the community members [18], DailyStrength enables patients to exchange experiences and treatments, discuss daily struggles and successes, and receive emotional support [19]. For each post collected from these websites, we extracted the URL, title, author’s username, post time, the body of the post, and the name of the message board. For each user of a collected post, we also collected the author’s age, friends, gender, and location, where applicable. As crawling of these sites has been performed at different times, some of the data we have collected do not reflect the current availability of certain attributes because of website format changes, for example, age and gender are currently available from WebMD user profiles but were not available before. In this study, the selection of demographic attributes we used for a source is based on the availability reflected by the majority of posts collected from that source, for example, most of the WebMD posts in our data were collected before age and gender were available, thus we did not use these attributes for an analysis of WebMD user demographics. We restricted the posts used from these sources to the first post in each thread. In our analysis, we used the post body, post title, message board name, and username from WebMD and the post body, post title, message board name, and user’s gender, age, and location from DailyStrength.

For general social networks, we chose Twitter and Google+ as they offer interfaces to easily collect their data (in contrast to Facebook). For each Twitter post, we collected the post content, post time, location, and the author’s username and location. For each Google+ post we collected the title, post time, update time, the post content, the location, and the author’s username, first and last names, age, gender, and location. As Twitter and Google+ are general social networks, we used 274 representative health-related keywords to filter them as follows: (1) Drugs: from the most prescriptions dispensed from RxList [20], we selected the 200 most popular drugs. By removing the variants of the same drug (eg, different milligram dosages), the final list of drugs contained 124 unique drug names. (2) Hashtags: 11 popular health-related Twitter hashtags, such as #BCSM (Breast Cancer and Social Media). (3) Disorders: 81 frequently discussed disorders, such as AIDS and asthma. (4) Pharmaceuticals: the names of the 12 largest pharmaceutical companies, such as Novartis. (5) Insurance: the names of the 44 biggest insurance companies, such as Aetna and Shield. (6) General health-related keywords “healthcare” and “health insurance.” To reach the final keyword counts for hashtags, disorders, pharmaceuticals, and insurance, we sampled each keyword from a larger list for each of these categories and kept keywords with a high ratio of health-related posts. In our analysis, we used the tweet body, user’s first and last name, and user’s location from Twitter and post body, post title, and user’s gender, age, first and last name, and location from Google+.

To filter Twitter with the health-related keyword list to retrieve relevant tweets for TwitterHealth, we used the Twitter streaming application programming interface (API) [21]. Similarly, we used Google+ API [22] to extract the relevant posts for Google+Health. For health-related online forums WebMD and DailyStrength, we built a crawler for each website in Java using jsoup [23], a library to extract and parse HTML content. Table 1 lists for each source the number of posts collected, the date ranges of collected posts, and whether the demographic attributes used in this study are present, and Table 2 lists the distribution of demographics for each source across each demographic attribute. For all 4 of these sources, we did not specifically focus our search on English-language posts aside from using English drug names; however, the majority of posts collected from these sources were in the English language.

Table 1.

List of all sources used with their number of posts, date range of posts, and the available demographic attributes.

Source	Number of posts	Date range	Gender	Age	Ethnicity	Location
TwitterHealth [24]	11,637,888	May 2, 2013 to November 11, 2013	Gender classifier [17]	No^a	Ethnicity classifier [17]	Yes^b
Google+Health [25]	186,666	August 24, 2009 to January 5, 2014	Yes	Yes	Ethnicity classifier [17]	Yes
DailyStrength [26]	1,319,622	June 21, 2006 to December 3, 2017	Yes	Yes	No	Yes
WebMD [27]	318,297	December 24, 2006 to May 11, 2019	Gender classifier [12]	No	No	No

Attribute and demographic		TwitterHealth, %	Google+Health, %	DailyStrength, n (%)	WebMD, n (%)
Gender
	Male	48.19^a	64.64^a	95,269 (17.26)^b	6769 (32.41)^b
	Female	51.81^a	35.36^a	456,600 (82.74)^b	14,117 (67.59)^b
Age (years)
	0-17	N/A^c	3.42^a	6656 (1.33)^b	N/A
	18-34	N/A	53.21^a	187,966 (37.55)^b	N/A
	35-44	N/A	21.89^a	126,646 (25.30)^b	N/A
	45-64	N/A	19.02^a	149,487 (29.86)^b	N/A
	≥65	N/A	2.46^a	29,847 (5.96)^b	N/A
Ethnicity
	Asian	3.24^a	5.60^a	N/A	N/A
	Black	0.30^a	0.30^a	N/A	N/A
	Hispanic	23.50^a	17.40^a	N/A	N/A
	White	73.00^a	76.60^a	N/A	N/A
Region
	Northeast	165,531 (19.83)^d	2598 (17.86)^d	73,221 (19.58)^b	N/A
	Midwest	174,620 (20.92)^d	2393 (16.45)^d	84,302 (22.55)^b	N/A
	South	313,350 (37.53)^d	4863 (33.44)^d	123,556 (33.05)^b	N/A
	West	181,400 (21.73)^d	4690 (32.25)^d	92,809 (24.82)^b	N/A

Category	Health-related online forums	General social networks	Example
Share experiences	Yes	Yes	“I could not work after Tylenol.” “I have taken Lipitor every day.”
Ask for specific medical advice or information	Yes	Yes	“Is honey allowed for diabetics?”
Request or give psychological support	Yes	Yes	“I hope your diabetes is under control.” “We’re thinking of you.”
About family (not about self)	Yes	Yes	“My son is now nine months old and teething like crazy.”
Share news	No	Yes	“Kaiser Permanente Invites Software Developers To Build Apps—Forbes. http://feedly.com/k/Zojwq”
Jokes	No	Yes	“Got any jokes about Sodium Hypobromite? NaBro.”
Advertisements	No	Yes	“Check out these two vitamins for one recipe! http://bit.ly/1471dbn”
Personal opinion	No	Yes	“Main frustration of lupus is losing the ability to do things that used to be normal”
Educational material	No	Yes	“Side Effects of Alzheimer’s and Dementia Drugs http://bit.ly/cK7L1f”

Category	WebMD	TwitterHealth	Google+Health
Share experiences	0.349	0.446	0.109
Ask for specific medical advice or information	0.768	0.225	0.108
Request or give psychological support	0.219	0.090	−0.007
About family (not about self)	0.736	0.322	−0.010
Share news	N/A^a	0.083	0.083
Jokes	N/A	0.177	0.029
Advertisement	N/A	0.220	0.107
Personal opinion	N/A	0.103	0.038
Educational material	N/A	0.164	0.091

Source	Extracted features
WebMD	Title, body, and board name
DailyStrength	Title, body, and board name
Google+	Title and body
Twitter	Body

Classifier and hyperparameter		Values
Random forest
	Maximum tree depth	2, 4, 8, 16, 32, 64
	Number of trees, n	10, 100, 1000
Support vector machine
	C	0.001, 0.01, 0.1, 1, 10
	Loss function	Hinge, squared hinge
Convolutional neural network
	Filter window sizes	(2, 3, 4), (3, 4, 5), (4, 5, 6)
	Feature maps per filter window size, n	100, 200, 300, 400, 500, 600

Source and category		Random forest		Support vector machine		Convolutional neural network
		Accuracy, n (%)	Balanced accuracy	Accuracy, n (%)	Balanced accuracy	Accuracy, n (%)	Balanced accuracy
WebMD
	Share experiences^a	41 (82)	0.83^b	41 (82)	0.81	41 (82)	0.82
	Ask for specific medical advice or information^a	40 (80)	0.82	41 (82)	0.83^b	37 (74)	0.76
	Request or give psychological support^a	39 (78)	0.71	43 (86)	0.8 ^b	38 (76)	0.68
	About Family (Not about self)^a	38 (76)	0.56	40 (80)	0.89^b	47 (94)	0.81
DailyStrength
	Share experiences^a	41 (82)	0.80	40 (80)	0.70	41 (82)	0.82^b
	Ask for specific medical advice or information^a	39 (78)	0.71	38 (76)	0.70	37 (74)	0.7 ^b
	Request or give psychological support	34 (68)	0.68	33 (66)	0.65	38 (76)	0.68^b
TwitterHealth
	Share experiences^a	39 (78)	0.77	41 (82)	0.82^b	43 (86)	0.74
	Share news^a	41 (82)	0.64	40 (80)	0.73	47 (94)	0.81
Google+Health
	Share experiences	44 (88)	0.48	35 (70)	0.72^b	45 (90)	0.60
	Share news	26 (52)	0.48	28 (56)	0.52	33 (66)	0.59^b
	Advertisement	38 (76)	0.59	24 (48)	0.53	42 (84)	0.6 ^b
	Personal opinion	39 (78)	0.48	37 (74)	0.71^b	42 (84)	0.60
	Educational material^a	40 (80)	0.66	34 (68)	0.76	41 (82)	0.79^b

Training source	Test source	Category	Classifier	Accuracy, n (%)	Balanced accuracy
WebMD	DailyStrength	Psychological support	SVM^a	328 (65.6)	0.656
WebMD	Google+Health	Share experiences	Random forest	428 (85.6)	0.584
DailyStrength	Google+Health ^b	Share experiences	CNN^c	383 (76.6)	0.800
Twitter	Google+Health	Share experiences	SVM	408 (81.6)	0.770
Twitter	Google+Health	Share news	CNN	360 (72.0)	0.562

Category	Gender, n (%)
	Male (n=6769)	Female (n=14,117)
Share experiences	3290 (48.60)	4835 (34.25)
Ask for advice	4741 (70.04)	6372 (45.14)
Psychological support	1914 (28.28)	5515 (39.07)
About family	1986 (29.34)	3623 (25.66)

Attribute and demographic		Total number of participants	Share experiences, n (%)	Ask for advice, n (%)
Gender
	Male	95,269	78,760 (82.67)	31,706 (33.28)
	Female	456,600	409,640 (89.72)	167,867 (36.76)
Age group (years)
	0-17	6656	6175 (92.77)	1694 (25.45)
	18-34	187,966	173,226 (92.16)	65,191 (34.68)
	35-44	126,646	113,796 (89.85)	48,335 (38.17)
	45-64	149,487	127,089 (85.02)	54,008 (36.13)
	≥65	29,847	24,420 (81.82)	10,581 (35.45)
Region
	Northeast	73,221	65,761 (89.81)	28,196 (38.51)
	Midwest	123,556	76,630 (90.90)	31,600 (37.48)
	South	123,556	110,597 (89.51)	46,933 (37.99)
	West	92,809	76,797 (82.75)	31,481 (33.92)

Region	Share experiences	Ask for advice
Northeast	WHY WEIGHT? LET’S LOSE WEIGHT AND FEEL GREAT! Self-Hate Syndrome Smoking Addiction & Recovery Urinary Incontinence Families of Prisoners Agoraphobia & Social Anxiety Cocaine Addiction & Recovery Obesity CHRISTIAN PARENTS of ESTRANGED ADULT CHILDREN Brain Injury	WHY WEIGHT? LET’S LOSE WEIGHT AND FEEL GREAT! Obesity Hidradenitis Suppurativa Endometriosis Deep Vein Thrombosis (DVT) Atrial Fibrillation (AFib) Diets & Weight Maintenance Gastritis Polycystic Kidney Disease (PKD) Hypothyroidism
Midwest	Just support acoa sanctuary helping with the housework kindredspirits The Coffee Shop aa Spoken Here Highly Sensitive People HSP Financial Challenges I can’t HEAR you! Pseudotumor Cerebri	kindredspirits Neurocardiogenic Syncope Pseudotumor Cerebri Gastritis Irritable Bowel Syndrome (IBS) COPD & Emphysema Parkinson’s Disease Polycystic Kidney Disease (PKD) Pancreatitis Graves’ Disease
South	prompts Beyond Medication InHisCare Bible Study Ticked off about Lyme Muscular Dystrophies aa friends Anxiety and POSITIVE CHOICES Games for Fun and Relaxation MS People Dealing with MS Pain Parents Whose children have been sexually abused	MS People Dealing with MS Pain High Cholesterol Cirrhosis Polymyositis & Dermatomyositis Addison’s Disease Meniere’s Disease MCTD Trying To Conceive Endometriosis Polycystic Ovarian Syndrome (PCOS)
West	A Little Bit Of Kindness Goes A long Way! The Walking Group Alanon support group VOICES OF RECOVERY AlAnon One Day At A Time BIBLICAL STUDIES The Sunflower group My Favorite Things. FrIeNdShIpRoOm three prayerpraise	AlAnon One Day At A Time Banana The Sunflower group WINGS VOICES OF RECOVERY A Laughter Club FrIeNdShIpRoOm Myofascial Pain Syndrome Hemochromatosis Colon Cancer

Gender	Share experiences	Ask for advice	Psychological support	About family
Male	Men’s Health Erectile Dysfunction Relationships and Coping Cholesterol Management Epilepsy Depression Allergies Oral Health Knee & Hip Replacement Ear, Nose & Throat	Erectile Dysfunction Cholesterol Management Men’s Health HIV/AIDS Depression Epilepsy Prostate Cancer Sports Medicine Pain Management Ear, Nose & Throat	Relationships and Coping Epilepsy Depression Back Pain Heart Disease Pain Management Anxiety & Panic Clomid Diabetes Parenting: 4 & 5-Year-Olds	Relationships and Coping Depression Erectile Dysfunction Back Pain Clomid Epilepsy Anxiety & Panic Pain Management Sleep Disorders Digestive Disorders
Female	Sexual Abuse Survivors Support Trying to Conceive: 12 Months, Still Trying Endometriosis Breast Cancer Infertility Treatment Pregnancy: After Infertility Pregnancy: After 35 Parenting: Elementary Ages Self-Harm Menopause	Trying to Conceive: 12 Months, Still Trying Infertility Treatment Dieting Club: 25-50 Lbs Parenting: Preteens & Teenagers Skin & Beauty Breast Cancer Food & Cooking Lupus Parenting: 3-Year-Olds Parenting: 9-12 Months	Chronic Fatigue Syndrome Lupus Sexual Abuse Survivors Support Breast Cancer Endometriosis Dieting Club: 10-25 Lbs Trying to Conceive: 12 Months, Still Trying Pregnancy: After 35 Dieting Club: 100+ Lbs Pregnancy: After Infertility	Sexual Abuse Survivors Support Pregnancy: After 35 Trying to Conceive: 12 Months, Still Trying Trying to Conceive: After Loss Breast Cancer Self-Harm Parenting: Preteens & Teenagers Parenting: 9-12 Months Dieting Club: 50-100 Lbs Parenting: 6-9 Months

Gender	Share experiences	Ask for advice
Male	Vow To Live LGBT Against Suicide Christian Church 24.7 Ministry Gay Men’s Challenges Single Dads GOYA Dealing with Diabetes2 and remembering Goldi A Child Abuse Survivors Group CALM and EASY GAMES Financial Challenges Liars Anonymous	A Laughter Club Dealing with Diabetes2 and remembering Goldi Impotence & Erectile Dysfunction Sex/Pornography Addiction High Cholesterol Tinnitus, Deafness and Ear Problems Urinary Incontinence Atrial Fibrillation (AFib) MRSA LDN .. Low Dose Naltrexone
Female	helping with the housework Lesbian Relationship Challenges prompts AlAnon One Day At A Time Daughters of Abusive Mothers Breastfeeding Parenting Toddlers (1-3) Post-Partum Depression Infertility Vulvar Cancer	Pregnancy Menopause Trying To Conceive Miscarriage Polycystic Ovarian Syndrome (PCOS) Family & Friends of Bipolar WHY WEIGHT? LET’S LOSE WEIGHT AND FEEL GREAT! Infertility Vulvar Cancer Breastfeeding

Attribute and demographic		Total number of participants	Share experiences, n (%)	Share news, n (%)
Gender
	Male	16,092	3188 (19.81)	1277 (7.94)
	Female	17,850	4835 (27.09)	1091 (6.11)
Ethnicity
	Asian	626	166 (26.52)	34 (5.43)
	Black	56	12 (21)	3 (5)
	Hispanic	2833	826 (29.16)	155 (5.47)
	White	9992	2259 (22.61)	728 (7.29)
Region
	Northeast	5362	1093 (20.38)	545 (10.16)
	Midwest	4686	1084 (23.13)	380 (8.11)
	South	9855	2162 (21.94)	850 (8.63)
	West	5448	1164 (21.37)	515 (9.45)

Category	Male	Female	Asian	Black	Hispanic	White	Northeast	Midwest	South	West
Share Experiences	<.001	.47	.24	.80	.68	.15	.13	.048	.002	<.001
Share News	<.001	<.001	<.001	.23	<.001	<.001	<.001	<.001	<.001	<.001

Age group (years)	Share experiences	Ask for advice
0-17	Weight Loss For Teens Gay & Lesbian Teens Depression–Teen Bipolar Disorder–Teen Self-Injury Transgender Depression Coming Out Bisexuality Eating Disorders	Weight Loss For Teens Depression–Teen Self-Injury Eating Disorders Anxiety
18-34	Sunny and Peaceful Skies Parenting Toddlers (1-3) Daily Positive Thoughts Trying To Conceive Parenting Newborns & Infants (0-1) College Stress Arnold-Chiari Malformation ALL MOODY BLUES Career Changes Cerebral Palsy	Trying To Conceive Neuropathy Pregnancy Miscarriage Polycystic Ovarian Syndrome (PCOS) Cerebral Palsy Endometriosis Pseudotumor Cerebri Sexually Transmitted Diseases–Female Schizophrenia
35-44	Vow To Live LGBT Against Suicide Parenting 'Tweens (9-12) Twins, Triplets & More Self-Hate Syndrome Parents Whose children have been sexually abused HOPEFUL HEARTS...LIVING AGAIN AFTER THE LOSS Neurofibromatosis Breastfeeding Hyperparathyroidism Stillbirth	kindredspirits Hyperparathyroidism Multiple Sclerosis (MS) Pseudotumor Cerebri Allergies Hemochromatosis Hypothyroidism Addison’s Disease MCTD Graves’ Disease
45-64	acoa sanctuary prompts Christians with MS InHisCare Bible Study The Serenity Room Ticked off about Lyme Biblical Studies and Archaeology Alanon support group Just support WHY WEIGHT? LET’S LOSE WEIGHT AND FEEL GREAT!	WHY WEIGHT? LETS LOSE WEIGHT AND FEEL GREAT! MS People Dealing with MS Pain Dealing with Diabetes2 and remembering Goldi Multiple Myeloma Menopause High Cholesterol LDN .. Low Dose Naltrexone Myofascial Pain Syndrome Neurocardiogenic Syncope Amputees
≥65	Banana A Little Bit Of Kindness Goes A long Way! AlAnon One Day At A Time VOICES OF RECOVERY The Walking Group The Front Porch Over The Fence Muscular Dystrophies CALM and EASY GAMES movie lovers	AlAnon One Day At A Time VOICES OF RECOVERY I can’t HEAR you! COPD & Emphysema Meniere’s Disease Parkinson’s Disease Sleep Apnea Interstitial Cystitis (IC) Atrial Fibrillation (AFib) Acromegaly

PERMALINK

Classification of Health-Related Social Media Posts: Evaluation of Post Content–Classifier Models and Analysis of User Demographics

Ryan Rivas, BS

Shouq A Sadah, PhD

Yuhang Guo, MS

Vagelis Hristidis, PhD

Abstract

Background

Objective

Methods

Results

Conclusions

Introduction

Background

Related Work

Analysis of Health-Related Social Media

Health-Related Demographic Analysis

Text Classification in Social Media

Methods

Datasets

Table 1.

Table 2.

Identifying Post Contents

Table 3.

Table 4.

Table 5.

Bot Filtering

Building Post Content Classifiers

Table 6.

Table 7.

Table 8.

Table 9.

Demographic Analysis

Top Distinctive Message Boards

Results

Demographics

WebMD

Table 10.

Table 11.

DailyStrength

Table 12.

Table 13.

Table 15.

Table 14.

Twitter

Table 16.

Table 17.

Google+

Table 18.

Discussion

Principal Findings

Limitations

Conclusions

Acknowledgments

Abbreviations

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases