Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Aug 27.
Published in final edited form as: 2018 IEEE Int Conf Healthc Inform Workshop (2018). 2018 Jul 19;2018:51–52. doi: 10.1109/ICHI-W.2018.00015

Assessing Mental Health Signals among Sexual and Gender Minorities using Twitter Data

Yunpeng Zhao 1,*,#, Yi Guo 1,*,#, Xing He 1, Jinhai Huo 2, Yonghui Wu 1, Xi Yang 1, Jiang Bian 1,+
PMCID: PMC6711604  NIHMSID: NIHMS998892  PMID: 31456873

Abstract

Sexual and gender minorities’ (SGMs) mental health needs remain little understood. Because of stigma and discrimination, SGMs are often unwilling to self-identify and reluctant to participate in traditional surveys. On the other hand, social media platforms have brought rapid changes to the health communication landscape and provided us a new data source for health surveillance of vulnerable populations. In this study, we explored machine learning methods to identify SGM individuals through finding their self-identifying tweets; then, applied a lexicon-based text analysis method to extract emotion and mental health signals from SGMs’ Twitter timelines. We found that 1) SGM people have expressed more negative feelings in their tweets, and 2) within SGM populations, gay and genderfluid individuals tend to use more words related to negative emotions, anger, anxiety, and sadness in their tweets.

Keywords: sexual and gender minorities, social media, mental health, Twitter

I. INTRODUCTION

Sexual and gender minorities (SGMs) are individuals whose gender identity or sexual orientation and practices differ from the majority of the population. Prior studies have documented a high prevalence of mental health distress in SGM populations. However, mental health issues among SGMs remain understudied, mainly due to lack of successful routine health surveillance efforts. Because of stigma and discrimination, SGMs are often unwilling to self-identify as SGMs and reluctant to participate in traditional surveys.

Meanwhile, social media and online participatory platforms brought rapid changes to the health communication landscape. Twitter has been successfully used to recruit research participants [1], including those from vulnerable populations such as SGMs [2]. However, no study has used these rich user-generated health data to understand SGMs’ health statuses and health behavior, especially their mental health needs.

In this study, we aim to assess SGMs’ mental health issues using text analysis methods. In particular, we aim to answer the following two research questions (RQs).

  • RQ1: Do SGM individuals experience different affect processes compared with non-SGM people when discussing gender identities and sexual orientations?

  • RQ2: Do different SGM subpopulations experience different emotional states when discussing their gender identities and sexual orientations?

II. METHODS

Our approach started with collecting tweets that were relevant to the discussion of SGM-related issues, determining self-identifying tweets and SGM Twitter users, and then assessing these users’ emotional states through text analysis.

A. Step 1: Data Preprocessing

In our previous studies [3], a list of SGM related keywords such as “transwomen” was developed through a snowball sampling process to ensure coverage. The data were collected through our Python tool [4] based on this list of keywords.

We first removed tweets that 1) were non-English or 2) not posted in the US. For further analysis, we also 1) removed hyperlinks, 2) removed mentions, and 3) converted hashtags into original English words (e.g., converted “#gay” to “gay”).

B. Step 2: Tweet Classification

We developed a two-step process leveraging two classification models to categorize the tweets into 3 groups (i.e., irrelevant, relevant but NOT self-identifying, and ‘relevant AND self-identifying).

We used the annotated data (i.e., 6,058 tweets) from our previous study [3] as the training data and experimented with two different classification methods (i.e., random forest and convolutional neural network, CNN). The model with the best performance was adopted to identify self-identifying tweets.

Self-identified SGMs were categorized by two reviewers into specific sexual orientations (i.e., gay, lesbian) and gender identities (i.e., transman, transwoman, and genderfluid).

C. Step 3: Linguistic Inquiry and Word Count (LIWC)

The Linguistic Inquiry and Word Count (LIWC) is a validated text analysis tool, which counts the percentage of words that reflect different emotions, thinking styles, social concerns, and sentiments of the writer.

We applied the LIWC tool on the entire Twitter timelines of the corresponding SGM subgroups and compared the emotional states of the different user groups.

III. RESULTS

A. Data Source

We collected over 20 million tweets from January 17, 2015 to May 12, 2015. After filtering out non-English tweets and non-US users, we retained 368,518 tweets for further analysis.

B. Tweet Classification

The random forest models outperformed the CNNs for both classification tasks, where 2,395 users were classified as self-identifying. We further manually annotated these 2,395 users into SGM subcategories (excluding 218 false positives): straight (38), gay (20), lesbian (6), transwoman (138), transman (45), and genderfluid (142).

C. Sentiments and Mental Health Issues of the Self-identified Sexual and Gender Minorities on Twitter

We then applied the LIWC tool on the different SGM population groups’ Twitter timelines to answer the two RQs.

To answer RQ1, since we have 2,070 (excluding 38 straight cases) SGMs, we first randomly selected 2,070 users from the ‘Relevant but NOT self-identifying’ user group as the control group. The LIWC results revealed that the SGM group had a higher negative emotion score, and expressed more anger, more anxiety, and more sadness issues in their tweets.

To answer RQ2, we further stratified our LIWC analysis by the different sexual orientation and gender identity within the ‘Relevant AND self-identifying’ user group. As shown in Fig. 1 and Fig. 2, users with different sexual orientations and different gender identities clearly expressed different emotional states in their Twitter timelines. For example, as shown in Fig. 1, gay people showed both more positive and negative emotions, and expressed more anger, more anxiety, and more sadness than the other two groups. As shown in Fig. 2, genderfluid people (whose gender varies over time) had more negative, anger, anxiety, and sadness emotions than other groups (transwoman and transman groups).

Fig. 1.

Fig. 1.

The emotion states (A: positive vs. negative; B: anger vs. anxiety vs. sadness) expressed in SGM individuals’ tweets across different sexual orientation groups.

Fig. 2.

Fig. 2.

The emotion states (A: positive vs. negative; B: anger vs. anxiety vs. sadness) expressed in SGM individuals’ tweets across different SGM groups with different gender identities.

IV. DISCUSSION AND CONCLUSION

SGM people face extreme challenges from the societies that breed stigma and prejudice, edging them towards the margins of societies, leading to discrimination and abuse, with alarming consequences damaging not only their physical but more significantly their mental health. Our results suggested that SGM individuals expressed more negative feelings in their tweets compared with non-SGM people. Further, within SGM people with different sexual orientations, gay people tend to use words related to negative emotions, anger, and sadness in their tweets more often than other subgroups. Within SGM people with different gender identities, tweets from genderfluid people contained more mental health-related signals (i.e., negative emotions, anger, and anxiety). However, Twitter as a data source have its limitations. First, as a passive data collection method, we would not be able to capture data from SGM individuals who are less vocal on social media platforms. Second, the validity of our social media findings need to be compared with results obtained from other sources, such as from national surveys.

Nevertheless, our study demonstrated the feasibility of using Twitter data as a public health surveillance tool to identify mental health signals in the vulnerable SGM population. Further in-depth investigations are warranted.

ACKNOWLEDGMENT

This work was supported in part by the National Institute of Health (NIH) grant UL1TR001427 and National Science Foundation grant #1734134.

REFERENCES

  • [1].Burton-Chase AM, Parker WM, Hennig K, Sisson F, and Bruzzone LL, “The Use of Social Media to Recruit Participants With Rare Conditions: Lynch Syndrome as an Example,” JMIR Res. Protoc, vol. 6, no. 1, p. e12, January 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Buckingham L et al. , “Going social: Success in online recruitment of men who have sex with men for prevention HIV vaccine research,” Vaccine, vol. 35, no. 27, pp. 3498–3505, 14 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Hicks A et al. , “Mining Twitter as a First Step toward Assessing the Adequacy of Gender Identification Terms on Intake Forms,” AMIA Annu. Symp. Proc. AMIA Symp, vol. 2015, pp. 611–620, 2015. [PMC free article] [PubMed] [Google Scholar]
  • [4].Bian J et al. , “Mining Twitter to Assess the Public Perception of the ‘Internet of Things,’” PloS One, vol. 11, no. 7, p. e0158450, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES