Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Jan 1.
Published in final edited form as: Comput Human Behav. 2017 Sep 6;78:98–112. doi: 10.1016/j.chb.2017.09.001

Examining Thematic Similarity, Difference, and Membership in Three Online Mental Health Communities from Reddit: A Text Mining and Visualization Approach

Albert Park a, Mike Conway a, Annie T Chen b
PMCID: PMC5810583  NIHMSID: NIHMS910019  PMID: 29456286

Abstract

Objectives

Social media, including online health communities, have become popular platforms for individuals to discuss health challenges and exchange social support with others. These platforms can provide support for individuals who are concerned about social stigma and discrimination associated with their illness. Although mental health conditions can share similar symptoms and even co-occur, the extent to which discussion topics in online mental health communities are similar, different, or overlapping is unknown. Discovering the topical similarities and differences could potentially inform the design of related mental health communities and patient education programs. This study employs text mining, qualitative analysis, and visualization techniques to compare discussion topics in publicly accessible online mental health communities for three conditions: Anxiety, Depression and Post-Traumatic Stress Disorder.

Methods

First, online discussion content for the three conditions was collected from three Reddit communities (r/Anxiety, r/Depression, and r/PTSD). Second, content was pre-processed, and then clustered using the k-means algorithm to identify themes that were commonly discussed by members. Third, we qualitatively examined the common themes to better understand them, as well as their similarities and differences. Fourth, we employed multiple visualization techniques to form a deeper understanding of the relationships among the identified themes for the three mental health conditions.

Results

The three mental health communities shared four themes: sharing of positive emotion, gratitude for receiving emotional support, and sleep- and work-related issues. Depression clusters tended to focus on self-expressed contextual aspects of depression, whereas the Anxiety Disorders and Post-Traumatic Stress Disorder clusters addressed more treatment- and medication-related issues. Visualizations showed that discussion topics from the Anxiety Disorders and Post-Traumatic Stress Disorder subreddits shared more similarities to one another than to the depression subreddit.

Conclusions

We observed that the members of the three communities shared several overlapping concerns (i.e., sleep- and work-related problems) and discussion patterns (i.e., sharing of positive emotion and showing gratitude for receiving emotional support). We also highlighted that the discussions from the r/Anxiety and r/PTSD communities were more similar to one another than to discussions from the r/Depression community. The r/Anxiety and r/PTSD subreddit members are more likely to be individuals whose experiences with a condition are long-term, and who are interested in treatments and medications. The r/Depression subreddit members may be a comparatively diffuse group, many of whom are dealing with transient issues that cause depressed mood. The findings from this study could be used to inform the design of online mental health communities and patient education programs for these conditions. Moreover, we suggest that researchers employ multiple methods to fully understand the subtle differences when comparing similar discussions from online health communities.

Keywords: Consumer Health Information, Anxiety Disorders, Depression, Stress Disorders, Post-Traumatic, Unsupervised Machine Learning

1. Introduction

Social media platforms, including online health communities, have become popular resources to exchange social support [1] (e.g., informational, emotional, instrumental) with others [2, 3]. This movement helps individuals to cope with and manage their illnesses while also providing the means to overcome barriers like geographical isolation, physical challenges, and stigma of disease. For example, one in four Internet users living with a chronic condition sought information from a peer with a similar condition by 2011 [4]. Peers can also offer advice on condition management [5], emotional support [6, 7, 8], and information to address everyday issues [2, 9].

Studies have consistently shown individuals can gain positive effects from interacting with other individuals in similar circumstances. Online interactions have been shown to improve depression [10, 11, 12, 13, 14], anxiety [12, 13, 15], stress [12, 13], and negative mood [16], as well as facilitate coping [17] and empowerment [10, 11, 13, 15, 17, 6]. Moreover, members of online health communities consistently emphasize the benefits of participation with respect to their treatment decisions, symptom management, clinical management, and outcomes [10, 11].

Individuals suffering from mental disorders often experience difficulty in obtaining support from their immediate social ties due to social stigma and discrimination associated with their illnesses [18, 19]. For such individuals, online health communities can be a useful medium to express their thoughts and feelings. However, extant literature has also reported that negative emotion can spread through interaction [20], and members of mental health communities have shown significant increases in anxiety, anger, and negative emotion following reports of celebrity suicides [21].

Although mixed results exist with respect to the effect of social media on mental health outcomes, the popularity among users is increasing. Additionally, an increasing number of researchers have employed statistical methods to study mental health in social media1 [22]. For example, researchers have found that individuals have engaged in discussions about their mental illness on social media [18, 23], found associations between use of multiple social media platforms and symptoms of depression and anxiety [24], found chatter supporting marijuana use for Post-Traumatic Stress Disorder (PTSD) treatments [25], used social media to predict individuals at risk for depression [26], compared the longitudinal psychological changes in members of an online depression health community against members of other online health communities [27], classified social media contents according to depressive symptoms [28, 23], characterized smoking and drinking problems [29, 30], tracked opioid related discussions [31], and classified substance addiction phases [32].

In this study, we examine the nature of online discussion pertaining to three mental health conditions: anxiety, depression, and PTSD. The connection between anxiety and depression, its shared symptomatology, and co-morbidity has been a subject of previous research [33]. Moreover, it has been observed that anxiety and depression often co-occur in the presence of stressful and traumatic events, and in connection to other health conditions such as chronic pain [34, 35]. Thus, the extent to which discussion topics in these communities are similar, different, or overlapping is of interest. However, at least to our knowledge, comparison of discussion topics has not been a focus of previous research on online mental health communities.

In addition, individuals who seek social support online often do so in order to find informational and emotional support or other types of support sources [36]. Though the types of exchanged social support with regard to depression have been studied [19, 37, 38], less is known about online social support exchanges concerning the other two conditions of interest. Discovering the most import discussion topics and understanding how members are utilizing respective communities could potentially inform the design of related mental health communities and patient education programs.

We aim to fill these gaps in the literature with this study and answer three research questions (RQ):

  • RQ1: what are the main themes expressed in the communities?

  • RQ2: how much thematic overlap, similarity, and difference exists among the communities?

  • RQ3: what can we understand about the overlapping member base?

Our approach employs document clustering techniques, along with qualitative and visual analysis, to compare discussion topics in online health communities focusing on anxiety, depression, and PTSD. In this work, we focus on Reddit, a highly popular social media platform. Reddit has been shown to be a well utilized social media platform for stigmatized illnesses, including mental disorders [18]. We focus on discussion topics in the following three sub-communities: r/Anxiety, r/Depression, and r/PTSD2.

2. Material and methods

2.1. Data collection

Our corpus was based on Reddit (http://www.reddit.com/), a popular social networking, online gathering, and news exchanging platform. In 2015, Reddit members participated in over 88,000 subreddits (i.e., topically focused sub-communities) and generated 83 billion page views. Members of the Reddit wrote over 73 million individual posts (i.e., a submission that starts a conversation) along with over 725 million comments (i.e., a submission that replies to posts or other comments) in subreddits [39].

We used Reddit’s official Application Programming Interface (API) [40], called the Python Reddit API Wrapper (PRAW) [41], to collect data from Reddit.com. Reddit allows members to organize sub-communities called ‘subreddits’. We focused on three subreddits: r/Anxiety, r/Depression, and r/PTSD. We collected the title, author id, timestamp, post or comment id, parent id (i.e., the targeted comment or post, in which the author was replying), number of direct replies, scores (i.e., the difference between up votes and down votes), and the content. All three subreddits are public communities allowing anyone with Internet access to view the content. By December of 2015, the r/Anxiety subreddit had been active for 7 years with 73,501 members; the r/Depression subreddit had been active for 6 years with 126,348 members; the r/PTSD subreddit had been active for 7 years with 5,257 members.

Between the months of Oct, 2015 to Dec, 2015, we downloaded a total of 7,410 posts and 132,599 associated comments that were made by 41,967 unique members. These comments were generated between January of 2011 and December of 2015. Because the Reddit API limited downloading posts to 1,000 posts at a time, we used different methods to collect larger datasets that could cover various topics discussed in these subreddits. First, we downloaded the ‘top’ rated posts of all time. Top posts were determined by members as they vote ‘up’ or ‘down’ over the lifetime of the posts [42]. We started with the top rated posts to systematically collect the most relevant topics to the community as a whole. We relied on collective opinions of the communities as a logical starting point for understanding the most relevant topics of the communities. Second, we supplemented our dataset by collecting ‘hot’ posts — recent top rated posts (see [43] for an account of the Reddit voting system). We added hot posts to cover newly emerging important and informative topics; however, we noticed a large overlap with top rated posts. Third, we added ‘new’, the most recent posts, for up to 15 days to cover diverse topics shared in these communities. Fourth, we removed repeated posts. The r/Depression subreddit was far more active than the r/Anxiety or r/PTSD subreddit in terms of the total number of posts, unique members, words, and associated comments. Thus, we downloaded only 3 days of new posts from the r/Depression subreddit, whereas we downloaded 15 days of data from the r/Anxiety and r/PTSD subreddits, to gather comparable sized datasets. Table 1 summarizes the dataset used for this research.

Table 1.

Characteristics of three Reddit communities studied

r/Anxiety r/Depression r/PTSD
Dates posts were written 9/2011–12/2015 1/2011–10/2015 7/2011–12/2015
Num. of posts 3,677 1,934 1,799
Num. of comments 49,929 67,599 15,071
Num. of members 15,336 23,916 2,712
Range of num. of comments in posts 1 to 200 1 to 201 1 to 87
Mean num. of comments in posts (Stdev) 13.58 (19.61) 34.95 (41.17) 8.38 (7.56)
Median num. of comments in posts 6 18 7
Total num. of words 3,079,219 3,573,228 1,615,328

We restricted our analysis to publicly available discussion content and the University of Utah’s Institutional Review Board (IRB) [ethics committee] exempted the study procedure and data from review (IRB_00076188) under Exemption 2 as defined in United States Federal Regulations 45 CFR 46.101(b). Although publicly available, individuals from the communities of our interest are nevertheless suffering from stigmatized illness. For this reason, we followed the guidelines suggested by Bruckman [44] and Eysenbach [45] to modify and de-identify our example quotes to ensure members’ anonymity and to protect their privacy.

2.2. RQ1: What are the main themes expressed in the communities?

Automatically identifying discussion themes involves organizing data content using knowledge resources like Unified Medical Language System (UMLS) [46] or clusters analysis. In this study, we used k-means clustering, a widely used unsupervised clustering algorithm [47]. Previous research has employed document clustering techniques to analyze discussion content in online health communities [48]. Document clustering techniques create clusters of documents that are similar to one another, but dissimilar from documents in other clusters [47]. In other words, we are creating topically similar clusters that contain high volumes of the same terms, a useful method for identifying main discussion themes in a large collection of documents. We elected to use an unsupervised algorithm because of the lack of a ground truth dataset.

We first used the Python Natural Language Toolkit (NLTK) (Version 3.1) [49] and Scikit-learn (Version 0.17) [50] to pre-process our dataset, which involved the removal of stop words, punctuation, high- and low-frequency terms, as well as tokenization. We then represented our data in vector spaces by generating term frequency matrices and weighted the terms according to their normalized term frequencies [50]. Each post and its associated comments were considered as a single document. We used Scikit-learn [50] to cluster our data using its default parameters for k-means clustering and estimated topic similarity with cosine similarity. After experimenting with varying numbers of clusters for each condition, we generated 15 clusters for each community for comparative purposes.

One limitation of the k-means clustering algorithm is that the algorithm can produce different clusters depending on the starting seeds [47]. Therefore, to check the validity of our clustering result, we repeated the clustering process and manual assignment of descriptive labels 10 times, each with 15 clusters, using the same parameters and procedures. Each time, we manually labeled the clusters according to its dominant theme, and created new labels if needed. In other words, if k-means clustering algorithm produces identical results in all 10 procedures, we would only need a total of 15 labels to describe all 150 clusters. However, if results were vastly different, we would need up to 150 labels to describe all the clusters. We then calculated the overlapping terms for identically labeled clusters that were generated from different clustering processes. In other words, we calculated overlapping terms of cluster ‘A’, if procedure 1 and 2 both produced cluster ‘A’. We considered the 50 most frequently occurring terms when calculating the overlapping terms and also tracked the total number of labels required to describe all 150 clusters. We used the degree of overlap between identically labeled clusters in solutions with different starting seeds as a validity check of our cluster solution. We provide detailed information on our validity check process in the appendix.

To characterize each cluster, we qualitatively examined the most frequent terms in each cluster, as well as the titles and contents of several randomly selected example posts and their associated comments. We followed an open coding process [51], to identify and assign each cluster a descriptive label. This method was used to elicit unknown, emerging themes grounded in the data. We then visualized an overview of discussion themes as a bubble chart and a network visualization. We employed D3 [52] to construct a bubble chart, and made the cluster size (i.e., the number of documents in each cluster) proportional to the bubble size. We used Gephi (Version 0.9.1) [53], a popular network visualization tool to generate a network visualization. We applied the ForceAtlas2 [54] layout to gain an overview of the discussion theme network structure. In this network, each node represented a cluster. We sized the nodes in proportion to the normalized sized of the clusters, but kept the label as a fixed size for the purpose of reading labels of smaller nodes. To determine the edge weight, we employed the 20 most frequently occurring words of each cluster as a proxy for that cluster, calculated Jaccard similarity between each pair of clusters, and employed this similarity measurement as the edge weight between each pair. Jaccard similarity is a common method for comparing the similarity and diversity of sets [55].

2.3. RQ2: How much thematic overlap, similarity, and difference exists among the communities?

To examine thematic similarities and differences among identified discussion themes from RQ1, we first represented discussion themes as a Venn diagram to visualize thematic overlaps. We then qualitatively compare and contrast the common themes among the three subreddits.

We also applied the Louvain modularity algorithm [56] available in Gephi to determine the similarity among clusters in the network visualization. Modularity — a widely used method to identify community structures in a network — measures the vertices in a group of nodes and then compares to a random connection [57]. For our purposes, modularity can identify natural divisions of subgroupings of nodes (i.e., community structure) with respect to frequently occurring terms in the network representation. In other words, modularity can illustrate how clusters are topically similar or dissimilar from one another. We used the edge weight and randomize feature and set the resolution to 1. The distance between theme nodes can be also influenced by the layout in network analysis; thus we visualized Jaccard similarity scores as a heatmap to provide greater detail on topical similarities and differences.

2.4. RQ3: What can we understand about the overlapping member base?

We investigated the characteristics of members who participated in multiple subreddits and the most commonly discussed themes by these overlapping members. To explore whether overlapping themes in the three subreddits were determined by overlapping memberships, we identified the five most common themes discussed by the overlapping members and the posting characteristics of these members.

3. Results

3.1. RQ1: What are the main themes expressed in the communities?

In this section, we present the main themes expressed in the three communities:r/Anxiety, r/Depression, and r/PTSD subreddits (Figure 1 on the next page).

Figure 1.

Figure 1

A bubble chart summarizing clustering results of the r/Anxiety (blue), r/Depression (orange), and r/PTSD (green) subreddits.

3.1.1. Anxiety

We generated 15 clusters using the r/Anxiety subreddit discussion content. Many clusters including SOCIAL ANXIETY3, MEDICATION, SCHOOL, PANIC ATTACK, and THERAPY/THERAPIST contained terms and labels which clearly differentiated the clusters from one another. However, a few clusters, such as POSITIVE EMOTION and GRATITUDE, shared terms. We distinguished these clusters from one another using the terms that they did not share, and the titles and contents of clustered posts. Although most of the labels are self-explanatory, WHAT DO YOU THINK may need a further explanation. The r/Anxiety subreddit contained many posts asking opinions of others on various topics from daily experiences to family situations. While the topics varied, the cluster label, WHAT DO YOU THINK, is based on this similarity in rhetorical style. Table 2 on page 13 summaries the clustering result for the r/Anxiety subreddit and shows the themes in descending order of cluster size.

Table 2.

Characteristics of the r/Anxiety clusters and sample terms.

Size (%) Cluster label Sample terms
584 (15.88%) living with anxiety day, try, getting, anxious, love, want, life, years, long, days
406 (11.04%) what do you think think, people, help, way, better, make, need, anxious, right, bad
379 (10.31%) social related social, people, think, friends, anxious, talk, phone, life, bad, talking,
299 (8.13%) medication doctor, medication, taking, effects, help, meds, dose, lexapro, zoloft, prescribed
287 (7.81%) want/need help want, make, help, person, need, talk, work, understand, feeling, depression
284 (7.72%) school school, college, high, years, people, work, home, family, friends, anxious
227 (6.17%) panic attack panic, attack, heart, anxious, felt, fear, doctor, stop, calm, breathing
193 (5.25%) positive emotion good, great, awesome, people, hope, thank, congrats, luck, proud, happy
187 (5.09%) need help help, need, think, people, worry, say, talk, tell, mind, scared
176 (4.79%) symptoms of anxiety disorder feeling, anxious, panic, symptoms, life, people, fear, hard, times, stomach
164 (4.46%) work job, work, day, people, boss, working, home, jobs, stress, anxious
160 (4.35%) gratitude thanks, love, great, good, sharing, better, awesome, nice, glad, helped
138 (3.75%) therapy/therapist help, therapist, doctor, therapy, appointment, talk, medication, issues, health, insurance
112 (3.05%) how others think of you thoughts, think, thought, people, thinking, feeling, fear, person, negative, helped
81 (2.20%) sleep sleep, night, bed, asleep, day, wake, fall, sleeping, tired, morning

3.1.2. Depression

We generated 15 clusters for the r/Depression subreddit. Clusters including BIRTHDAY, SCHOOL, SLEEP, WORK, and GRATITUDE were clearly differentiated from one another. We distinguished clusters such as TALKING TO FRIENDS and FRIENDS AND FAMILY, which shared identical or semantically similar terms, using the procedure described in the previous subsection. Table 3 summaries the clustering result for the r/Depression subreddit and shows the themes in descending order of cluster size.

Table 3.

Characteristics of the r/Depression clusters and sample terms

Size (%) Cluster label Sample terms
354 (18.30%) needs of depressed individuals need, depressed, talk, doctor, mental, understand, family, medication, anxiety, therapy
285 (14.74%) depressed times feeling, years, depressed, bad, hard, long, suicide, talk, days, end
163 (8.43%) feelings of depression feeling, bad, depressed, end, hard, sorry, worse, feels, tried, months
163 (8.43%) negative emotion bad, shit, trying, hard, fucking, fuck, depressed, sad
147 (7.60%) work work, job, hard, right, working, lot, trying, hate, money, days
132 (6.83%) gratitude thank, great, awesome, nice, job, proud, glad, amazing, beautiful, wow
126 (6.51%) talking to friends talk, friends, say, friend, person, care, tell, said, told, talking
123 (6.36%) friends and family friends, friend, family, best, away, love, college, great
122 (6.31%) understanding depressed individuals depressed, understand, feeling, bad, worse, need, hard, mental, illness, problems
101 (5.22%) school school, college, year, high, class, semester, grades, work, friends, parents
95 (4.91%) love and depression love, happy, hate, depressed, days, feeling, great, bad, guy, relationship
55 (2.84%) positive emotion happy, hope, luck, best, glad, great, thank, wish, awesome, love
38 (1.96%) sleep sleep, bed, night, wake, sleeping, dreams, asleep, morning, awake, waking
18 (0.93%) birthday birthday, happy, hope, friends, love, wish, great, celebrate, enjoy, facebook
12 (0.62%) social related living, friends, eat, social, joke, conversation, games, media, enjoy, topic

3.1.3. PTSD

Like the previous two subreddits, we generated 15 clusters for the r/PTSD subreddit. Many clusters including TRAUMA THERAPY, WORK, SLEEP, TRAUMA TRIGGER, EMDR THERAPY, NIGHTMARE, ANIMAL, and RESEARCH were clearly distinguishable. A few clusters, such as SLEEP and NIGHTMARE shared similar terms but also had distinctive and non-overlapping terms. They were distinguished from one another using the same procedure as above. Table 4 summaries the clustering result for the r/PTSD subreddit and shows themes in descending order of cluster size.

Table 4.

Characteristics of the r/PTSD clusters and sample terms.

Size (%) Cluster label Sample terms
352 (19.57%) help for PTSD help, need, sorry, trying, hope, therapy, friends, family, trauma, relationship
319 (17.73%) positive emotion good, thanks, better, talk, hope, great, yes, love, sure, able
259 (14.40%) living with PTSD years, life, want, happened, help, day, year, hard, therapy, home
215 (11.95%) trauma therapy trauma, therapy, therapist, help, better, symptoms, abuse, brain, talk, traumatic
208 (11.56%) work work, anxiety, panic, job, deal, new, started, months, trying, hard
161 (8.95%) wanting to help (by others) help, want, talk, support, understand, care, symptoms, experience, health, situation
59 (3.28%) sleep sleep, night, wake, asleep, hours, sleeping, bed, nightmares, dreams, awake
45 (2.50%) gratitude for sharing (techniques and stories) thank, sharing, reading, posting, writing, beautiful, blog, share, powerful, appreciate
37 (2.06%) trauma trigger trigger, triggers, makes, trauma, triggered, flashbacks, watch, warnings, music, movie
36 (2.00%) EMDR therapy emdr, therapist, therapy, trauma, work, session, memories, sessions, helpful, effective
28 (1.56%) nightmare nightmares, prazosin, wake, doctor, anxiety, medication, fear, sorry, therapy, dose
26 (1.45%) everyday issues home, work, today, boyfriend, personal, hate, friends, dealing, couple, issues
23 (1.28%) service animal dog, service, dogs, help, animal, trained, support, better, able
16 (0.89%) research study, questions, survey, research, information, link, university, project, article, contact
15 (0.83%) misc. good, worry, guys, sorry, thanks, work, bad, getting, bit, reading

Figure 2 on the next page is an overview of the discussion theme network, in which blue, red, and green represent discussion themes from the r/Anxiety, r/Depression, and r/PTSD subreddits, respectively.

Figure 2.

Figure 2

An overview of discussion theme network. The r/Anxiety denoted by (A) is in blue, the r/Depression denoted by (D) is in red, and the r/PTSD denoted by (P) is in green.

To check the validity of our clustering method, we repeated the clustering process and manual assignment of descriptive labels 10 times, each with 15 clusters, using the same parameters and procedures. On average k-means clustering of the r/Anxiety subreddit produced 80% overlapping terms for identically labeled clusters and 29 unique labels were used to describe 10 k-means clustering results. For the r/Depression subreddit, a total of 25 unique labels were used and on average the labels contained 75% overlapping terms for identically labeled clusters. K -means clustering for the r/PTSD subreddit produced 69% overlapping terms for identically labeled clusters and required a total of 28 unique labels. Detailed information on the validity check process as well as all the labels and portions of overlaps for the individual labels for each subreddit is provided in the appendix.

3.2. RQ2: How much thematic overlap, similarity, and difference exists among the communities?

The three subreddits shared four discussion themes: sharing of POSITIVE EMOTION, GRATITUDE for received emotional support, and discussion related to SLEEP and WORK. Common themes are summarized in Figure 3 on page 16 as a Venn diagram. In the following sections, we present the results of our qualitative study of the four shared discussion themes.

Figure 3.

Figure 3

Summary of common themes as a Venn diagram.

3.2.1. Sharing of Positive Emotion

Sharing positive emotion is a well-documented practice in online health communities [7, 48, 58], including communities focusing on mental health [18]. Our dataset also consistently showed sharing positive emotion as a method of support. Below are canonical examples of Reddit members’ attitude towards receiving positive emotion.

“I had a girlfriend who abused me emotionally. I left her and I don’t regret it. We need to have positive people in our lives!” - a comment from the r/Depression subreddit.

“Thanks! I still feel terrible, but I am more confident and I feel like I can get through this, with your support. This is an awesome community. As soon as I can [do so], I’ll be sure to return the favor!” - a comment from the r/PTSD subreddit.

It is not unexpected to find sharing positive emotion as a common theme in all three subreddits. Although the effect of positive emotion on clinical outcomes remains uncertain, previous literature suggests that exposure to positive emotions could alleviate negative emotion. For example, positive emotions could help individuals suffering from the distress of diseases or injuries [59], anxiety disorders [60], and chronic stresses [61, 62].

3.2.2. Showing Gratitude for Emotional Support

In our dataset, members frequently displayed gratitude for receiving positive posts, a common practice in online health communities [37]. The following are canonical examples of showing gratitude for received emotional support.

“As someone with both anxiety disorder and depression, I find your comment very helpful in my times of weakness. Thank you so much!!!” - a comment from the r/Anxiety subreddit.

“This is awesome, thank you so much for sharing your stories. Your positive reaction is overwhelming and I am thankful.” - a comment from the r/Depression subreddit.

“Thank you for this insightful post! I hope that someday this will be available to those with other types of PTSD. Positive stories like this are encouraging, thanks!” - a comment from the r/PTSD subreddit.

Similar to sharing positive emotion, showing gratitude has been suggested to be beneficial for individuals suffering from mental health conditions. For example, gratitude has led to higher levels of perceived emotional support, and lower stress and depression [63]. Moreover, making a conscious effort to acknowledge gratitude has been suggested to have emotional and interpersonal benefits [64], although further investigation regarding clinical outcomes is warranted.

3.2.3. Sleep

Sleep-related discussions were salient in all three subreddits. Subreddit members spoke of their issues generally, and to inquire whether others had similar experiences:

“[…] I have problems sleeping as it is […]” - a comment from the r/Anxiety subreddit.

“Who else is experiencing sleep trouble??” - a comment from the r/PTSD subreddit.

The nature of sleep-related discussions was different in the three subreddits. Awakening, being tired throughout the day, and having trouble sleeping were commonly discussed in the sleep clusters for the r/Anxiety and r/PTSD subreddits. Panic was more salient in the r/Anxiety and r/PTSD subreddits, and nightmares, in the r/PTSD subreddit. For the r/PTSD subreddit, the k-means clustering algorithm yielded two sleep related clusters: sleep and nightmare. The sleep cluster contained general sleep related problems as well as nightmare issues as shown below. This result is consistent with the DSM-5 classification [65] and extant literature on PTSD and sleep [66].

“[…] Whenever I see a fire or a related image, I get horrible nightmares. I wish it would stop! […]” - a comment from the r/PTSD subreddit.

In the r/Depression subreddit, discussions relating to sleep were much different. Topics that were frequently discussed by the r/Depression community included feeling unrefreshed after sleeping, regardless of how long they slept.

“[…] man, I slept for 12 hours last night but I feel like I only got a couple of hours of sleep […]” - a comment from the r/Depression subreddit.

Another major theme in the r/Depression’s sleep cluster was members’ desire to die in their sleep.

“Yeah I feel you. I have a nice car, house, and a nice life, yet every night when I go to bed, I hope I never wake up.” - a comment from the r/Depression subreddit.

Although the results of the cluster analysis showed that all three subreddits had discussions related to sleep, our qualitative analysis showed differences in context. In the r/Anxiety and r/PTSD subreddits, members talked about issues related to sleep troubles such as nightmares, whereas the r/Depression subreddit, discussions were more about unrefreshing sleep and/or feelings of wanting to die, which may be exacerbated at bedtime.

3.2.4. Work

As was the case with sleep, work-related discussions were prevalent in all three subreddits. In the r/Anxiety and r/PTSD subreddits, conversations were about the difficulty of keeping, performing, or getting a job due to their symptoms.

“So I had to quit my job again because of my anxiety. This has been happening for the past several years. […]” - a comment from the r/Anxiety subreddit.

“I’ve been jobless for the last couple years, so I was happy to get a new job. But now, I am always tired and worn out. I realize it’s part of adjusting to a new life. However, now I’ve started to get flashbacks and nightmares again.” - a comment from the r/PTSD subreddit.

However, in the r/Depression subreddit, work-related content was about working too much, quitting/time off from work due to depression, and venting members’ dislike of their work.

“[…] I have no interest in working, but I have no choice but to make a living […]” - a comment from the r/Depression subreddit.

“I had to take a leave of absence from work. So I can sort out my life stressors […]” - a comment from the r/Depression subreddit.

3.2.5. Shared themes in the r/Anxiety and r/Depression subreddits

The r/Anxiety and r/Depression subreddits shared two discussion themes: discussion of SCHOOL and SOCIAL related issues.

“I am forced to take some time off from school due to anxiety and depression […]” - a comment from the r/Anxiety subreddit “[…] I’m 16 years old. Till now, I was always homeschooled. I just now started to attend public school. […] I am socially awkward and I get anxious around students my age. But it’s getting worse since I started public school. […]” - a comment from the r/Depression subreddit.

As shown in the example, many members explicitly mentioned anxiety and depression together, especially with a topic regarding school. This demonstrated the importance of school-related issues for both subreddits, though the foci were slight different. For the r/Anxiety subreddit, social anxiety was the main topic of the social cluster, whereas for the r/Depression subreddit, the topics ranged widely from reminiscing about one’s past social life, to using social media to cope with depression.

3.2.6. Shared themes in the r/Anxiety and r/PTSD subreddits

The r/Anxiety and r/PTSD subreddits shared one common theme, LIVING WITH their respective conditions. Both clusters showed issues relating to daily struggles or mundane bad experiences, however. Another similar topic between the two subreddits was related to help. The r/PTSD subreddit had two help related clusters, WANTING TO HELP AND HELP FOR PTSD. Many members of the r/PTSD subreddit were individuals who did not have PTSD but wanted to help other individuals who are suffering from PTSD. We labeled this cluster as WANTING TO HELP. The other cluster was labeled HELP FOR PTSD in which members were explicitly asking for help. The r/Anxiety subreddit had both types of discussions (i.e., WANT/NEED HELP AND NEED HELP); however, in our qualitative analysis, the r/Anxiety subreddit had far fewer discussions where friends and family members were asking for advice on how to help individuals with anxiety disorder. Moreover, those discussions on ‘wanting to help’ were typically clustered together with discussions regarding ‘asking for help’, thus we called this cluster WANT/NEED HELP. Another help related cluster from the r/Anxiety subreddit was NEED HELP, in which members were expressing their needs and explicitly asking for help. Though these topics are somewhat related, we have treated these clusters as different in Figure 3 on page 16 to preserve the contextual distinction.

We found that the four common themes — POSITIVE EMOTION, GRATITUDE, SLEEP, and WORK — were not necessarily collocated in the network (Figure 4 on the following page). For instance, POSITIVE EMOTION node from the r/Depression cluster (red) is not adjacent to POSITIVE EMOTION node from the r/Anxiety cluster (blue). However, common theme nodes from the r/Anxiety and r/PTSD (green) subreddits are always closer to each other than to the corresponding theme from the r/Depression subreddit. For instance, SLEEP nodes from the r/Anxiety and r/PTSD subreddits are closer to each other than to SLEEP node from the r/Depression subreddit. Moreover, the nodes for LIVING WITH — a theme found only in the r/Anxiety and r/PTSD subreddits — are located relatively close, whereas SCHOOL and SOCIAL — themes found only in the r/Anxiety and r/Depression subreddits — have more distance between nodes.

Figure 4.

Figure 4

An overview of modularity structure in discussion themes. The r/Anxiety denoted by (A) and the r/PTSD denoted by (P) are in the same group (yellow), whereas the r/Depression denoted by (D) is in another group (gray).

From distance differences among common nodes, we observed that the r/Anxiety and r/PTSD subreddits shared more common terms with themselves than with the r/Depression subreddit. To validate this observation, we applied the Louvain modularity algorithm and color coded the communities according to the modularity result (Figure 4 on the next page).

We observed that the algorithm clearly divided the network into two main communities of nodes. All the discussion themes from the r/Depression subreddit were grouped as one community and all the discussion themes from the r/Anxiety and r/PTSD subreddits were grouped as another community. A heatmap (Figure 5 on the following page) also shows that r/Anxiety and r/PTSD subreddit are strongly linked compared to each other than to the r/Depression subreddit. Similar to the Figure 2 on page 15, SOCIAL from the r/Depression and GRATITUDE from the r/PTSD show less commonly shared words with any other nodes. The Louvain modularity algorithm and a heatmap support the assertion that concerns of the r/Anxiety and r/PTSD subreddit members are semantically more similar to one another than to the r/Depression subreddit.

Figure 5.

Figure 5

Heatmap representation of discussion themes.

3.3. RQ3: What can we understand about the overlapping member base?

We investigated the extent to which members participated in multiple subreddits to verify that common themes were not mainly due to overlapping memberships. 5.96% (2,357 out of 39,541 members) participated in more than one of the three subreddits; these members participated in multiple topics and their discussions were not especially concentrated in overlapping themes. Only a small number of individuals participated in all three subreddits (n=65); some of these individuals explicitly mentioned the comorbidity and their thoughts and experiences concerning the conditions.

“In my opinion, Anxiety is your mind speeding up – having more thoughts and worries while depression is your body slowing down – having less energy and sleeping more. When you add slowed body to sped up mind, everything becomes out of balance. You feel like you are constantly fighting youself, and nothing gets done except that you tire yourself out.” - a comment from the r/Anxiety subreddit “I have Anxiety, Depression and PTSD. I often think it’s scary what’s inside my head.” - a comment from the r/Depression subreddit.

The r/Anxiety and r/Depression subreddits had a substantial number overlapping members (n = 2,037), however, the two subreddits were also much bigger than r/PTSD (Table 1). The most commonly discussed themes by the overlapping members differed in the two subreddits. r/Anxiety and r/PTSD shared 217 overlapping members and also showed differences in what overlapping members most commonly discussed in the subreddits. r/Depression and r/PTSD did not have a common theme, but shared 233 members between the two subreddits. The main discussions made by these 233 members were also different, suggesting differential uses of these subreddits (Table 5).

Table 5.

Characteristics of members who participated in multiple subreddits and most commonly discussed themes by these overlapping members

in r/Anxiety in r/Depression

Num. of Overlapping Members                 2,037
                2,037
Num. of Total Posts                 8,124
                8,547
Mean (Stdev)of Num. posts               3.99(8.44)
              4.20(7.41)
Five Most Occurring Themes (%) 1. what do you think (24.57%) 1. depressed times (32.76%)
2. want/need help (17.23%) 2. gratitude (12.05%)
3. social anxiety(9.05%) 3. understanding depressed individual (10.82%)
4. gratitude (9.02%) 4. negative emotion (10.04%)
5. positive emotion (8.12%) 5. talking to friends (9.86%)

in r/Anxiety in r/PTSD

Num. of Overlapping Members                 217
                217
Num. of Total Posts                 962                1,507

Mean (Stdev)of Num. posts              4.43(5.42)
             6.94(10.58)
Five Most Occurring Themes (%) 1. what do you think (22.97%) 1. help PTSD (26.74%)
2. want/need help (18.61%) 2. work (13.87%)
3. gratitude (9.36%) 3. trauma therapy (13.07%)
4. panic attack (9.25%) 4. living with PTSD (9.42%)
5. social anxiety (8.52%) 5. positive emotion (9.09%)

in r/Depression in r/PTSD

Num. of Overlapping Members                 233
                233
Num. of Total Posts                 1,214
                1,799
Mean (Stdev)of Num. posts              5.21(11.50)              7.72(12.21)

Five Most Occurring Themes (%) 1. depressed times (40.53%) 1. help PTSD (28.40%)
2. negative emotion (13.10%) 2. work (14.34%)
3. understanding depressed individual (12.85%) 3. trauma therapy (12.62%)
4. gratitude (9.31%) 4. living with PTSD (11.17%)
5. talking to friends (7.41%) 5. wanting to help (9.34%)

4. Discussion

Understanding the nature of online discussion from similar online health communities can be challenging, especially if the members share similar symptoms and co-morbidity. In this study, we not only to compare the overall discussion themes and the contextual variations among the same themes, but also to identify differences in participation and discussion styles using content from Reddit. It has been reported that anxiety and depression often co-occur in the presence of stressful and traumatic events [34, 35]. Thus, we analyze r/Anxiety, r/Depression, and r/PTSD subreddits.

We first employed cluster analysis to examine the 15 main themes that were discussed in the r/Anxiety, r/Depression and r/PTSD subreddits. As expected, there were common topics that appeared in multiple subreddits. In particular, the three subreddits shared four discussion themes: POSITIVE EMOTION, GRATITUDE, SLEEP, and WORK. To gain a better insight, we then qualitatively analyze the four common themes.

Sharing of POSITIVE EMOTION and showing of GRATITUDE are themes that have also been reported in past research on health-related online communities [18, 7, 48, 58], and it was not unexpected to see these expressed here. SLEEP-and WORK-related problems were salient in all three subreddits, though they were discussed in a slightly different manner. Through manual examination, we discovered that members of the r/Anxiety and r/PTSD subreddits described their issues differently from members of the r/Depression subreddit (see Results).

This result was corroborated by our theme network analysis, in which the Louvain modularity algorithm separated the r/Depression subreddit’s discussion themes from the r/Anxiety and r/PTSD subreddits’ discussion themes. A heatmap also shows darker representations between r/Anxiety and r/PTSD subreddits compared to themes in the r/Depression subreddit. Although the topics of the discussions were the same, our approach underline the need to focus on different issues pertaining to SLEEP and WORK with these conditions.

The prevalence of mentions of specific medications, treatments, and support resources also highlighted the differences in subreddits. The names of common medications were present among the top cluster keywords for the r/Anxiety and r/PTSD subreddits, and the r/PTSD clusters also included specific therapies and support resources, such as Eye Movement Desensitization and Reprocessing (EMDR) therapy and service animals. In contrast, cluster topics in the r/Depression subreddit focused more on contextual aspects of depressive episodes such as affect (negative emotion, positive emotion), interpersonal interactions (friends and family, talking to friends, understanding depressed individuals), and situations in which depressive symptoms may occur (birthdays, love and depression).

A number of overlapping members exist among the three subreddits. Only a small fraction of members participated all three subreddits (n=65), but a larger sum of members (n=2,037) participated in both the r/Anxiety and r/Depression subreddits. However, these members were discussing a variety of topics, not just the commonly shared topics. Given the small number of shared members between subreddits and the variety of topical interests expressed, we concluded that these members were not the main reason for the commonly shared discussion themes.

Taken together, these results suggest that the r/Anxiety and r/PTSD subreddit members are more likely to be individuals whose experiences with a condition are more long-term, and who are interested in treatments and medications. The r/Depression subreddit members may be a more diffuse group, some who may be dealing with long-term issues, but perhaps who are dealing with transient issues that cause depressed mood. This may also account for the larger size of the r/Depression subreddit. The word “depression” perhaps has a larger set of connoted meanings, some clinical and others not; and thus, those who participate in this subreddit may be a more diffuse and transient group.

The contribution of this work is twofold: first, we illustrated the differences in the nature of online discussion from communities sharing similar symptoms and co-morbidity. Our findings inform more nuanced discussion and suggest that researchers employ multiple methods to fully understand differences among conditions with shared symptomatology. Second, from a practical perspective, understanding these subtle differences in the nature of online discussion could used to inform the design of online mental health communities and patient education programs for these conditions.

5. Limitations and Future Directions

This study has various limitations. First, this study employed data from one social networking site. As mentioned in the Introduction, Reddit is a widely used platform, but it is more frequently used by certain demographic segments, particularly by younger males [67, 68]. This bias toward a younger audience enabled us to identify particular areas of interest of members, such as school and work. However, in future studies it would also be useful to examine other online health communities that address these conditions to better characterize the needs of people who experience the conditions, but may not be represented in the Reddit community.

Second, the topic of online discussions is prone to change as the discussion progresses [69]. We expect many of the longer discussions (i.e., higher number of comments. note Table 1 on page 7) to have multiple topics, however, our method of analysis would only identify single topic for each of those discussions. Thus, different machine learning algorithms, such as latent Dirichlet allocation that can produce multiple topics for a single document, could produce different results. Moreover, misspellings, abbreviations, contractions, and community-specific nomenclatures are common in online health communities [70]. A high prevalence of these cases could alter the clustering result by changing the overall counts of important terms. However, we did not encounter these cases during our manual examination of the most frequently occurring terms for each cluster.

Third, the number of words that were used in to calculate thematic similarity could have influenced the rendering our visualizations. In our method, we pre- specified the use of the 20 most frequently occurring words of each cluster to determine the edge weight between nodes for the network visualization and to determine the proximity of discussion themes in the heatmap. If we had considered a larger number of words in the visualization process, the overall visualizations could look different. For instance, the two outlier themes (e.g., SOCIAL from the r/Depression and GRATITUDE from the r/PTSD) could have more common words with other themes.

This difference in group composition provokes some interesting questions. First, does the difference in content suggest different usage intents on the part of the subreddit members? Second, if so, do subreddits fulfill the needs of these different types of members equally well? Short-lived participation is generally viewed as a challenge of managing online health communities, due to issues like lurking [71] and dropping out [72, 73]. If subreddit members visit to obtain a solution to a transient issue or simply to vent and move on, their needs might have been fulfilled but without much contribution to the community. Similarly, the informational and emotional support needs for those who are looking for more long-term solutions are different. How to design online health communities that can support both types of needs and members while sustaining the overall activities of the communities is an unanswered question.

Other than group composition, one might consider what features online health communities might provide to help users find content [74] or members [75, 76] that are important to them. Based on the results presented in this paper, it could be useful to provide interactive functionality for members to locate posts and other members [77] that discuss particular types of medications and treatments, but also to identify content based on contextual elements of experience, such as social occasions, the need for understanding, and so on. Also, considering the temporality of participants’ experiences (e.g. long-term, transient, etc.) is paramount.

Though beyond the scope of this study, it would be interesting to consider how the informational and emotional support content of these communities compares to the content that is delivered in Internet-based interventions for anxiety, depression and PTSD [78, 79]. Although potential overlaps of users between online health communities and Internet-based interventions may exist, it is unlikely that any of these avenues could reach the entire population who are suffering from these mental health conditions [80]. Thus, understanding what each of these different avenues can and cannot offer is important.

6. Conclusion

In this study, we compared online discussion content from three online mental health communities concerning conditions that similar symptoms and can potentially be co-morbid. More specifically, we collected data from Reddit, a highly popular social media platform, and analyze content from three subreddits focusing on anxiety, depression and PTSD. First, we employed cluster analysis to identify the top 15 discussion themes for each subreddit. Second, we combined text mining, visualization and qualitative analysis methods to identify thematic similarities and differences between the three subreddits. Through text mining and qualitative analyses, we observed that members of the three communities shared overlapping concerns (i.e., sleep- and work related problems) and discussion patterns (i.e., sharing of positive emotion and showing gratitude for receiving emotional support), but also exposed contextual variations in these themes among the three communities. By rendering a network visualization of the topics discussed and employing a community detection algorithm on this network, we illustrated discussions from the r/Anxiety and r/PTSD subreddits shared greater similarities to one another than to discussions from the r/Depression subreddit, and employed a heatmap to support a closer examination of these similarities and differences. We also supported this finding by examining the shared members’ participation and discussion. The findings from this study could be used to inform the design of online mental health communities and patient education programs for these conditions. Moreover, our findings inform more nuanced discussion and suggest that researchers employ multiple methods to fully understand differences among conditions with shared symptomatology.

AP’s contribution to this research was supported by the National Library of Medicine of the National Institutes of Health under award number T15 LM007124.

MC’s contribution to this research was supported by the National Library of Medicine of the National Institutes of Health under award numbers R00LM011393 & K99LM011393.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

  • Compares the nature of online discussion from online mental health communities.

  • Identifies the common themes as well as the contextual variations in common themes.

  • Highlights the differences in participations and discussion styles.

Appendices

A. All labels used to describe 10 k-means clustering results for r/Anxiety subreddit.

Cluster label Occurrence of labels in 10 k-means Overlapping vocabularies (in %)
medication 10 89.42
misc. 10 41.40
panic attack 10 94.18
sleep 10 88.13
therapy/therapist 10 80.83
work 10 90.93
living with anxiety 9 74.94
positive emotion 9 71.39
social anxiety 9 94.28
how others think of you 8 62.76
want/need help 8 65.64
what do you think 8 77.79
congratulation 7 84.57
school 7 70.48
anxious feeling 6 68.13
heart attack and panic attack 3 82.67
gratitude 2 90.00
medication and symptoms 2 66.00
symptoms of anxiety disorder 2 80.00
animal 1
anxiety and relationship 1
anxiety symptoms 1
depression 1
different medication experiences 1
do I have anxiety 1
driving 1
mental disorder 1
need help 1
school and driving 1

B. All labels used to describe 10 k-means clustering results for the r/Depression subreddit

Cluster label Occurrence of labels in 10 kmeans Overlapping vocabularies (in %)
misc. 10 18.48
birthday 10 78.93
need for depression 10 65.29
school 10 88.13
sleep 10 99.60
talking to friends 10 78.58
depressed times 9 64.61
gratitude 9 80.61
feelings of depression 8 60.86
love and depression 8 78.14
work 8 84.86
positive emotion 7 52.10
understanding depressed individual 7 80.86
congratulation 5 80.60
games 5 72.80
friends and family 4 68.00
loss and depression 4 83.67
negative emotion 4 62.67
suicide 4 89.00
medication 2 48.00
reddit 2 92.00
music 1
talking 1
social 1
weather 1

C. All labels used to describe 10 k-means clustering results for the r/PTSD subreddit

Cluster label Occurrence of labels in 10 kmeans Overlapping vocabularies (in %)
misc. 21 20.31
nightmare 10 86.80
animal 10 77.82
EMDR therapy 10 78.40
wanting to help 10 73.96
work 9 72.06
gratitude for sharing (techniques and stories) 9 52.89
trauma therapy 8 68.86
living with PTSD 8 73.43
help for PTSD 8 84.71
positive emotion 7 69.71
memory 7 65.62
wanting to talk 6 68.00
trauma trigger 5 62.00
therapy 4 71.33
sleep 2 80.00
symptoms and treatment 2 42.00
getting better 2 72.00
anxiety 2 66.00
diagnosis 2 38.00
research 1
anger 1
wanting to understand 1
everyday issues 1
military 1
driving anxiety 1
sexual 1
doctor 1

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

here broadly defined to include internet discussion communities like Reddit

2

To maintain clarity, we use r/followed by a fixed width style when anxiety, depression, and PTSD are referring to communities

3

to maintain clarity, we use small caps style when referring to cluster themes

References

RESOURCES