Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2022 Aug 2;27(2):443–453. doi: 10.1007/s10461-022-03779-2

Deep learning for topical trend discovery in online discourse about Pre-Exposure Prophylaxis (PrEP)

Andy Edinger 1, Danny Valdez 2,, Eric Walsh-Buhi 1, Johan Bollen 2,3
PMCID: PMC9344253  PMID: 35916950

Abstract

Pre-Exposure Prophylaxis (PrEP) interventions are increasingly prevalent on social media. These data can be mined for insights about PrEP that may not be as apparent in surveys including personal musings about PrEP and barriers/facilitators to PrEP uptake. This study explores online discourse about PrEP using an interdisciplinary public health and computational informatics approach. We collected (N = 4,020) tweets using Twitter’s Application Programming Interface (API). These data underwent a three-step neural network/deep learning process to identify clusters within these tweets and relative similarity/dissimilarity between clusters. We identified 25 distinct clusters from our original collection of tweets. These clusters represent general information about PrEP, how PrEP is communicated among diverse groups, and potential pockets of misinformation and disinformation regarding PrEP. Specific clusters of interest include discussions of medication side effects, social perception of PrEP usage, and concerns with costs and barriers to access of PrEP interventions. Our approach revealed diverse ways PrEP is contextualized online. Importantly this information can be leveraged to identify points of possible intervention for disinformation and misinformation about PrEP.

Supplementary Information

The online version contains supplementary material available at 10.1007/s10461-022-03779-2.

Keywords: Deep Learning, PrEP, Social Media, Twitter

Introduction

Pre Exposure Prophylaxis (PrEP) is a daily oral or injectable medication for groups at highest risk of contracting the Human Immunodeficiency Virus (HIV), including men who have sex with men (MSM), transgender women, and people who inject drugs [1]. When taken as prescribed, PrEP is shown to reduce HIV-1 infection by 75% or more when coupled with safe-sex and/or clean needle sharing practices [2, 3]. To date, about 1 million adults globally have begun PrEP uptake with representative adoption rates across gender, race, and ethnicities [4].

Prior to PrEP’s advent, HIV awareness, prevention, and outreach materialized as public health education campaigns on HIV mitigation strategies [2]. Though the success of these programs varied, PrEP’s Food and Drug Administration (FDA) approval during the social media era also introduced novel mediums to rapidly disseminate information about HIV and HIV prophylaxis as prevention [5]. Indeed, digital and e-health (broad categories of multidisciplinary public health science at the intersection of technology and healthcare) have played a pivotal role in creating online and mobile knowledge and awareness campaigns to spread information about PrEP and its benefits rapidly [1, 57]. Much of these campaigns are naturally tailored to appeal groups at higher risk of HIV exposure including younger audiences (ages18-29) and racially diverse MSM [5, 8]. The success of social media campaigns led to an 880% increase in PrEP adoption among key groups since 2012 [9].

A natural consequence of online, digital health, e-health, and mHealth interventions and marketing campaigns is a paper trail of online discourse among social media users sharing and/or discussing PrEP in diverse contexts. This allows, beyond obvious analyses of intervention effectiveness online, the opportunity to mine these data for deeper, niche discussions about PrEP that may not be apparent in surveys or data derived from interventions themselves, including potential misinformation about PrEP usage and safety, ongoing lawsuits against Gilead Pharmaceuticals, the manufacturer of the two FDA-approved PrEP medications (Truvada® and Descovy®) among other potential determinants and barriers to PrEP uptake [10].

Social Media Mining and Public health

Beyond PrEP, social media’s inescapable role in the public lexicon has shifted approaches to studying health behavior. Over two-thirds of the US population use at least one social media platform daily, resulting in billions of unique data points that are spontaneous, open-source, diachronic, and open-ended [11]. A multitude of studies have conclusively demonstrated that markers of health behavior can be meaningfully extracted from social media data collected en-masse. This includes extracting mental health markers, such as cognitive distortions [12], subjective well-being indicators from individual social media timelines [13], constructing ego-networks from friendship lists [14], and even simply identifying common topics or themes within millions of tweets (i.e., posts on Twitter), or postings on other social media platforms [15].

To bridge the fields of computational informatics and public health, specific calls for interdisciplinary collaborations between these fields have been proposed. Valdez et al., (2021) highlighted several strategies for bridging computational informatics and public health with Natural Language Processing (NLP) methods. Valdez, Patterson, and Prochnow (2021) have also called for the harmonious application of public health and social network theory to lend context and/or qualitative prediction to social media studies. Collectively, this body of work argues that computational informatics methods coupled with public health frameworks can yield rich and nuanced findings about a given event– including tracking online discourse amid medical innovations and ascertaining belief systems for them.

Social media mining and PrEP: an interdisciplinary analysis

As an opportunity to study information dissemination and synthesis from a joint public health/computer science perspective, PrEP’s evolution from a medical novelty into life-saving necessity and LGBTQIA + cultural phenomena may be particularly impactful. As a semi-novel yet revolutionary medication poised to continue reshaping HIV mitigation, insights from such an interdisciplinary analysis can identify salient discussion points about PrEP, gaps in PrEP knowledge, and points of intervention from a policy perspective. Therefore, the purpose of this study is to explore online discourse about PrEP using an interdisciplinary Public health and Computational informatics approach. Our study is guided by two research questions:

  1. Can we meaningfully consolidate PrEP related tweets into emerging themes or ideas?

  2. How can interdisciplinary frameworks add additional nuance to online conversations about PrEP and other Public health topics more broadly?

Insights from this study will contribute to reshaping our approach to mining social media data for medical novelties. By leveraging tools and strategies from multiple fields– in this case Public health and Computational informatics– we will ultimately glean deeper understanding of the how medical novelties of meaningfully communicated in online spaces, both in positive and negative contexts.

Methods

Data

Data germane to this study were collected from tweets posted between June 1, 2018 and May 31, 2021. All data were obtained via a data repository that continuously queries Twitter’s Application Programming Interface (API). Twitter’s API allows developers to query and archive an estimated 1% of total daily tweet volume for a given search term. As these tweets are only available in real-time, this 1% sample constitutes the maximum data available for analysis. However, this rate of collection remains the standard for data derived from Twitter. For our study, specifically, we collected a series of tweets containing key words relating to Pre-Exposure Prophylaxis (PrEP), PrEP usage, and associated PrEP medications: Truvada, Descovy, #PrEP, “pre-exposure prophylaxis”, #truvada, #descovy, #truvadaprep, #descovyprep, #truvadaforprep, #descovyforprep. We filtered our data to remove duplicate and non-English tweets. Our total collection of tweets (i.e., a corpus) was a random sample of N = 4,020 tweets, deemed to be representative of PrEP-related discourse yet analyzable by our domain experts. All data were saved into a single repository where they were scrubbed of identifying information. Our use of these data conformed to the Institutional Review Board (IRB) standards for data security and privacy for secondary data analysis.

Analyses

We leveraged three broad classes of computational informatics methods and algorithms to analyze our data: [1] The Sentence Bi-Directional Encoder Representations from Transformers (S-BERT); [2] Principal Component Analysis (PCA) with Uniform Manifold Approximation and Projection (UMAP); and [3] K-means Clustering. These methods have been extensively used to analyze social media data (see Karisani & Karisani, (2021)for an example of S-BERT and social media mining). These tools have also been applied in public health contexts [19].

Gauging tweet similarity with Sentence Bidirectional Encoder Representations from Transformers (S-BERT). S-BERT is an extension of the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT). The BERT approach uses neural networks to detect and map patterns in large-scale text data [20, 21]. BERT is trained with large-scale text data from which it learns numerical representations of text semantics by analyzing matching sequences of words [22]. The resulting representations then allow quantitative comparisons of the similarity of two texts.

Likewise, as shown in Fig. 1, S-BERT generates a numerical vector representation of each tweet (taking as a “sentence”) that represents its semantics. This vector can be numerically compared to the similarly generated S-BERT vectors of other tweets using standard distance or similarity metrics, e.g., cosine-similarity. The latter is commonly used to gauge the degree of alignment between two vectors: which varies from zero (representing orthogonality (dissimilarity)) to one (representing collinearity (similarity)).

Fig. 1.

Fig. 1

 S-BERT transforms tweets or sentences into 384 × 1 vectors whose similarity or distance can be compared using common distance metrics such as cosine similarity. The resulting similarity quantitatively expresses the semantic similarity between the respective tweets by gauging how well-aligned their S-BERT vectors are in a 384 dimensional semantic space (here depicted as 2-dimensional for visual simplicity). Prototypical tweet texts are displayed to exemplify similar and dissimilar classifications per our analysis

Provided S-BERT vectors represent the semantics of 2 respective tweets, we can thus determine the degree of semantic similarity of these tweets by calculating the cosine similarity of their respective S-BERT vectors. For example, a tweet “I am concerned about the side-effects of my PrEP treatment” would be translated to a specific 384 × 1 vector representing its content (384 × 1 is the default dimensionality for pre-trained S-BERT), while another tweet “I don’t think my PrEP pills are safe because of the effects it has on my mood” would be translated to another 384 × 1 vector. The cosine similarity between the two respective vectors is 0.624, indicating they are moderately well-aligned, and thus similar in meaning (both describe concerns about PrEP treatment), regardless of whether they use or do not use the exact same wording. On the other hand, the cosine similarity between “I am concerned about the side-effects of my PrEP treatment” and “I can’t afford the costs of my monthly PrEP treatment” would be 0.502 (the 2 tweets describe different concerns about PreP treatment).

Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). The S-BERT vectors that are retrieved for each tweet are highly dimensional (D = 384) numerical indicators of the semantics of their content, and thus need to be projected to a lower-dimensional space (D = 2) for visualization. PCA and UMAP are common techniques for dimensionality reduction that were employed to facilitate analysis of sentence embeddings [2325]. PCA extracts the principal components, a set of variables which successively capture a greater degree of variance between data points. The original data can be projected onto the most significant components, thereby optimally retaining the most significant variation of the original data in new, lower-dimensional space, assigning to each data point a new coordinate in this reduced space. Similarly, UMAP reduces data to a 2-dimensional visualization that preserves the similarities between each data point and its nearest neighbors. The operation of UMAP can be modified according to two parameters, namely the number of neighbors of a data point it takes into consideration and how tightly clustered neighboring data points are visualized in the resulting graph. By varying these parameters, one may control how much local versus global structure is preserved in the final projection. Here we apply a PCA to the 384-dimensional S-BERT vectors retrieved for each PrEP tweet such that they can be positioned in a lower-dimensional semantic space spanned by the respective PCA components that explain the greatest amount of variance in the original data, followed by a UMAP procedure that positions each tweet in a two-dimensional visualization.

K-Means Clustering. We divide the tweets in the UMAP visualization into a set of visually distinct clusters using k-means clustering. The k-means clustering algorithm [26] partitions a dataset into k number of highly dense sets of data points by adjusting a set of cluster centers such that the assigned clusters minimize the distance between the cluster center and the data points that are assigned to the cluster. K-means clustering requires the value k to be specified, which in the case of two-dimensional data may be determined by visually or qualitatively analyzing the data set for natural visually compelling groupings, such as distinct groupings of data points on a plot or logical divisions of topics in a set of text samples. Integrating dimensionality reduction (PCA and UMAP) with k-means clustering supports the visual analysis of (any) topical clusters occurring within the set of tweets analyzed.

Procedure

The first step in our process consisted of retrieving S-BERT vectors for each tweet. These high-dimensional (n = 384) sentence vectors were then reduced to two dimensions using a combination of PCA and UMAP such that each tweet could be placed on a 2-dimensional map according to their semantic similarity allowing for a visual analysis of the data. A wide range of parameter values were tested for PCA and UMAP, producing two-dimensional mappings of the corpus for visual analysis. These two-dimensional maps were subject to a k-means clustering procedure which assigned each tweet to a cluster, thereby codifying the visual clustering of the set of 4,020 tweets in the map to an explicit partitioning.

The approach that was determined to be the most effective used PCA to reduce the initial S-BERT embeddings to 40 dimensions. We then further reduced to 2 dimensions using UMAP with parameters of 20 nearest neighbors and a minimum distance of 0.1. These parameter values direct UMAP to prioritize local groupings of data points more heavily than global structure, and were best able to preserve the structure of topical clusters out of all parameters tested.

Visual analysis of this data indicated that roughly 25 distinct topical clusters were present, which informed the application of k-means clustering to partition the data set into 25 clusters. Two of our authors, serving in part as domain experts, analyzed the resulting tweet clusters that the k-means algorithm identified. They independently generated topic summaries of the content of the tweet clusters they examined. See Fig. 2. Their summaries were compared by the lead author of this study, and overlap between summaries was deemed as sufficient agreement for interpretation.

Fig. 2.

Fig. 2

Diagram showing how we produced a visual map and topical clustering of N = 4,020 PrEP-relevant tweets, partitioned into 25 clusters revealing relevant online PrEP topics, the content of which were validated by a team of experts

Results

We mined n = 4,020 tweets that matched the mentioned PrEP relevant terms and translated their semantics in a visualization using S-BERT, UMAP, and K-Means clustering. Broadly, we identified several observations regarding the myriad contexts in which PrEP is discussed online. We present our findings briefly without comment.

RQ1: Can we meaningfully consolidate PrEP related tweets into themes or ideas?

Using the data processing pipeline shown in Fig. 1, we identified 25 unique themes that occurred among the N = 4,020 tweets in our sample. Tables I and II outline all themes, delineated by the name of each theme, a brief definition, and a list of ten words most associated with each theme. Though most of clusters were clear and interpretable, we observed one cluster with a string of unclear terms and phrases (see Cluster 18). Lists of associated terms were generated by frequency of terms within each cluster, and the presence of apparently non-topical terms such as “I” or “re-tweet” is a natural artifact of this process.

Table I.

List of Clusters and Associated Words per Cluster [1–13]

Cluster Name Definition Terms Cluster Name Definition Terms
1 Generic PrEP This cluster refers to tweets about the cost of generic Truvada medications Truvada PrEP 8 Truvada Lawsuit This cluster refers to a collection of tweets regarding PrEP ads and class-action lawsuits against Gilead [the maker of Truvada] Truvada Insurer
RT [Re-tweet] Costs Ads HIV
I Insurance Drug RT
Generic Drug Action People
Cost the Lawsuit Taking
2 Truvada Marketing This cluster refers to a collection of tweets about the marketing of Truvada and Descovy [PrEP medications] Truvada RT 9 Truvada Alternatives This cluster refers to a collection of tweets regarding alternatives to Truvada, including Descovy, a monthly PrEP injection Single HIV
Commercial Commercials Injection Better
I Difference A Daily
PrEP Hunty Months Pill
Descovy Borrow Prevented RT
3 Truvada vs. Descovy This cluster refers to a mix of comments and narratives on switching from Truvada to Descovy Truvada Side 10 Gilead Price Gouging This cluster refers to a collection of tweets regarding the exorbitant pricing by Gilead for Truvada Truvada Price
Descovy Better Gilead PrEP
PrEP TR HIV Gilead Sciences
Switch Insurance Drug Costs
available Generic RT [Re-tweet] People
4 Injectable PrEP This cluster refers to a mix of anecdotes about a long-acting injectable outperforming Truvada, and PrEP effectiveness Truvada I 11 Descovy for PrEP This cluster refers to a collection of tweets regarding the Food and Drug Administration’s approval of Descovy for PrEP (the second medication option) Descovy Drug
HIV Drug HIV Men
RT [Re-tweet] Cabotegravir PrEP RT
Pill Effective Women #PrEP
PrEP Prevention Approved FDA
5 PrEP Patent This cluster refers to criticisms of Truvada’s patent by Gilead Sciences and news of the patent being overturned by the court system Truvada the 12 Descovy versus Truvada This cluster refers to a collection of tweets comparing Descovy to Truvada; likely originating from a different news cycle than Cluster #8 Truvada a
Patent Court Pill Single
Gilead Generic Injection Prevented
RT [Re-tweet] Patents Months HIV
PrEP Drug Daily Better
6 PrEP Uptake This cluster refers to a general collection of tweets on PrEP access and uptake Truvada Taking 13 Truvada Adverse Effects This cluster refers to a collection of tweets on Truvada Adverse Effects, including Renal Failure Truvada the
PrEP #PrEP I Side
I You PrEP Effects
RT [Re-tweet] is RT [Re-tweet] Drug
the Find Kidney It
7 Injectable PrEP vs. Pill This cluster refers to a collection of tweets discussing a New York Times article on efficacy of administration methods for PrEP RT [Re-tweet] A
Single Injection
Months Prevented
HIV Better
Daily Pill

Table II.

List of Clusters and Associated Words Per Cluster [14–25]

Cluster Name Definition Terms Cluster Name Definition Terms
14 Truvada Side Effects This cluster refers to a collection of tweets referring to the co-occurrence of PrEP uptake with chronic fatigue syndrome Truvada Patient 20 PrEP Effectiveness This cluster refers to a collection of tweets referring to the general effectiveness of PrEP for HIV prevention Truvada Prevention
Chronic Claims HIV #HIV
Fatigue Improvement PrEP Prevention
Syndrome AIDS RT Effective
Another Drug Drug The
15 PrEP and Risky/Anonymous Sex This cluster refers to a collection of tweets about perceptions of others on PrEP/How PrEP can instigate condomless sex Truvada Anonymous 21 PrEP in China This cluster refers to a collection of tweets referring to China approving Truvada for HIV prevention Truvada PrEP
Whore Become Drug Prevention
#bareback Strangers China HIV
#gayhypnosis #cumdump Approved RT
Unclear Lose --‘ HIV-Preventing
16 Truvada and Gay Identity This cluster refers to a collection of tweets about gay identity and PrEP uptake Truvada People 22 Gilead charity This cluster refers to a collection of tweets referring to Gilead Science, the makers of Truvada, donating over 200,000 doses of PrEP to at-risk groups Truvada Sciences
Gay PrEP HIV Drug
I HIV Gilead Free
Gays Men Donate RT
RT Taking Prevention US
17 Truvada, PrEP, and Associated Costs This cluster refers to a collection of tweets about the high costs of PrEP Truvada the 23 Gilead Descovy Comments This cluster refers to a collection of tweets referring to criticisms directed at Gilead for announcing Descovy as a ‘safer’ PrEP alternative only after Truvada goes generic Descovy Truvada
HIV Cost PrEP Gilead’s
Drug Per Gilead Women
RT Price Drug RT
Costs PrEP FDA HIV
18 Unclear Topic This cluster refers to a collection of randomly, seemingly unrelated tweets Truvada I’m 24 Gilead and Truvada Damage Control This cluster refers to a collection of tweets about Gilead donating Truvada doses to mitigate bad press about patents Truvada Donate
I PrEP Gilead Patent
RT You HIV US
I’m | Drug HIV
Gonna Bottle Prevention RT
19 Truvada with no generic option This cluster refers to a collection of tweets about Gilead’s Truvada patent and lack of generic alternatives Truvada US 25 Descovy as a viable PrEP option This cluster refers to a collection of tweets referring to marketing Descovy and other Descovy related advertisements Descovy It
Gilead Generic I Truvada
RT Costs PrEP I’m
PrEP The RT It’s
Price Year Commercial Drug

Cluster content spanned a wide range of relevant topics, reflecting a diversity of context that ran the gamut from the quotidian pre-occupations of PrEP users. For example, clusters emerged relative to general PrEP uptake and PrEP information. We also identified several topics believed to discuss some of the side effects associated with PrEP including chronic fatigue syndrome and renal/hepatic issues. As a likely consequence of reported side effects, we also observed topics related to lawsuits against Gilead Sciences, the maker of Truvada, and Truvada alternatives (i.e., Descovy, a monthly injectable PrEP medication). Lastly, we observed several topics related to condomless sex or MSM hookups among those on an active PrEP regimen.

RQ2: How can interdisciplinary frameworks add additional nuance to online conversations about PrEP and other Public health topics more broadly?

By leveraging the visualization tools described above we mapped the corpus of tweets into 25 topical clusters, as depicted in Fig. 3. These clusters represent bodies of tweets that are semantically and contextually similar, and are placed in the map such that their position reflects their relative similarity to other clusters. As a result of the K-means clustering algorithm, several outlying groups of tweets were unavoidably assigned their own clusters. These clusters (6, 10, 12, 13, and 20) comprised less than 10 tweets each.

Fig. 3.

Fig. 3

 A Vector map depicting 25 clusters identified in a collection of (n = 4,020) tweets. Clusters in close proximation are semantically and contextually similar; clusters that are distal are semantically and contextually unrelated. Clusters on the right of the figure depict outlying clusters not visible in this map

Clusters in the center (i.e., topics 25, 11, 19, 2, and others) are general PrEP topics, which in some capacities are similar to all clusters located in the vector map. For example, cluster 25 contains tweets discussing Truvada costs and cluster 11 discusses PrEP medication relative to other methods of HIV prevention. Clusters further from the center are typically responses to news or specific events related to PrEP medications.

Close proximity among clusters indicate tweets aligned with themes identified in our analysis that are similar or overlap in content. For example, clusters related to PrEP cost, PrEP generic alternatives, and Gilead price gouging are likely to have close vector representation and appear close to one another in the vector map given the likely similarity of these bodies of tweets.

Distal clusters (e.g. Topics 9 and 7 compared with 21 and 1) indicate bodies of tweets that are semantically and contextually different from each other. For example, the previously mentioned clusters regarding PrEP costs and generic alternatives are most distal from topics/themes about condomless sex and MSM hookups.

Discussion

The purpose of this study is to explore online discourse about PrEP using an interdisciplinary public health and computational informatics approach. By analyzing a diverse array of PrEP related tweets, we uncovered several allusions to PrEP promotion efforts and how PrEP is positively and negatively contextualized online.

PrEP as a Medical and Cultural Social Media Phenomenon

PrEP altered the scientific community’s approach to preventing HIV exposure and transmission [27]. As one of the first medications approved during the social media era [28], scientists leveraged such online spaces to promote PrEP uptake and adherence. A natural consequence of using online mediums for information dissemination is the diachronic and public domain nature of these data [29]. Over time online information and interventions for PrEP have created a nine-year trail of online discourse including how PrEP is broadly communicated online.

Most clusters uncovered by our analysis aligned with PrEP information dissemination. This includes topics related to general information about PrEP, medical costs, and various options for PrEP including Truvada (a once-daily oral medication) and Descovy (a once monthly injectable medication). Tweets associated with these clusters suggest sincere efforts to promote and disseminate medically accurate information about PrEP (e.g., TWEET: Talking to your doctor about HIV prevention treatment can be intimidating– but it doesn’t have to be). Similar tweets also addressed ways insurers, companies, and/or advocacy groups could mitigate the cost of PrEP (e.g., TWEET: PrEP to prevent HIV is expensive. We are here to help!). We also identified several topics that compared Descovy and Truvada and the associated pros and cons for each medicine (e.g., TWEET: Truvada, a once daily oral pill; or Descovy, a once monthly injection: Which is right for you?).

We also highlighted several clusters that expressly referred to how PrEP is communicated among the MSM community. These cultural aspects about PrEP are reflected in our findings, which yielded two topics related to MSM hookups, LGBTQIA + identity, and condomless sex (Topic 15 (MSM community and PrEP) and Topic 16 (Condomless sex and MSM hookups)). Indeed, since PrEP’s inception the prophylactic treatment forged new identities among gay and bisexual men with regard to casual dating, hookups, and perceptions of PrEP users and non-users [30]. For example, users of popular MSM internet hookup sites (i.e., Grindr, Scruff, and others) are increasingly disclosing HIV serostatus and PrEP use as part of their bios or personal profiles [31]. Yet, disclosing one’s serostatus or PrEP use may come at the cost of social stigmatization among others who choose not to follow these practices. Indeed, evidence suggests there is a sharp divide in attitudes and perceptions of MSM who use PrEP, versus those who do not, and discrepancy among each persons’ choice. For example, tweets associated with these topics alluded to PrEP use disclosure and how PrEP users perceive themselves and others. (TWEET: If it’s not queer shaming, it’s slut shaming à la “Truvada whore” getting bandied around amongst our own like we don’t have a modern medical miracle sitting in our goddamn laps). These tweets, and others, suggest a certain degree of skepticism in sexual practices adopted by PrEP users, or even uncertainty about the safety of PrEP to reduce HIV infection. This observation is further supported by increases in STIs among MSM engaging in unprotected sex (i.e., barebacking), prompting concerns of drug resistant STI strains [32]. Ongoing research on identity formation should examine the effects of PrEP on gay and bisexual social circles, including how PrEP regimens alter personal social networks in a hookup context.

Nuanced PrEP Topics Reveal Social and Medical Barriers Inhibiting Uptake

Since the FDA approved PrEP in 2012, an estimated 1 million global (and 100,000 US) adults regularly use PrEP to prevent HIV-1 infection [33]. Yet, estimates indicate that PrEP remains underutilized among all eligible groups [34]. In the United States, only one quarter of eligible adults (i.e., MSM, trans women, and people who inject drugs) are on a PrEP regimen [35]. Global estimates highlight similarly low uptake rates in Sub-Saharan Africa, Asia, and Latin America [3638]. Due to poor uptake and adherence, there have been efforts to identify barriers that may inhibit PrEP use and access. Though barriers are numerous, and often context and country-specific, recurring concerns among US and global populations include: lack of knowledge/awareness about HIV and PrEP, perceived HIV risk, social stigma, healthcare mistrust, lack of access, financial burden, among others [39].

Several topics along the periphery of our vector map alluded to complications and barriers of PrEP uptake. Topics along the periphery suggest these conversations are not the unilateral focus of the corpus but represent side conversations tangentially related to central topics located at the center of the vector map (i.e., PrEP information dissemination). Some barriers alluded to in these topics are highly documented including associated PrEP costs and insurance coverage. However, amongst topics alluding to documented barriers to PrEP use, we also identified additional barriers that are well-known in legal circles yet somewhat empirically understudied, including PrEP effectiveness , PrEP side effects, exorbitant costs and price-gouging, and lawsuits against Gilead Sciences, the maker of Truvada . These topics strongly suggest that at least some discourse about PrEP is framed negatively– particularly calling into question the long-term effects of continued PrEP uptake. These concerns are not new and highly documented [40]. Indeed, beginning in 2019, several state and federal lawsuits (and one class action lawsuit) were filed against Gilead Sciences. These lawsuits allege Gilead knew, or should have known, the active medication in Truvada could lead to serious side-effects including bone density loss, renal damage, and liver failure if not carefully monitored by medical providers [41]. Financial complaints similarly allege Gilead’s patent on Truvada (which would prevent the creation of a generic alternative) only served profiteering purposes by increasing the monthly cost of PrEP between $1200 and $2000 USD [10, 42], though generic alternatives for Truvada have since become available.

Independently, PrEP related concerns and/or barriers identified in our model should not affect PrEP uptake. However, given low PrEP uptake and adherence, there is evidence these controversies may be adversely affecting PrEP adoption and maintenance [43]. Indeed, persistent concerns in e-health and digital health science are short and long-term effects of misinformation, disinformation, and ideological echo chambers on individual health beliefs and behaviors [44]. Social media’s sordid history of ideological polarization further supports these concerns with regard to information seeking and dissemination among likely groups. Regarding PrEP, in 2019, an influx of social media ads likely targeting anti-PrEP groups began seeking plaintiffs in lawsuits targeted at Gilead. A study on the effects of such ads concluded that nearly half of participants who viewed the ads would either never start PrEP or discontinue their current PrEP regimen [10]. This, coupled with conflicted perceptions of PrEP users among MSM suggests that tailored messaging is needed to mitigate the effects of misinformation and disinformation campaigns. Future studies should continue studying online PrEP discourse, including identifying sources of misinformation and how to counter it. Interventionists should also consider leveraging pockets of disinformation as viable insights/sources for interventions promoting accurate medical information.

Insights into Interdisciplinary Public Health & Computational Informatics Collaborations

Public health has historically borrowed methods, tools, and algorithms from computer science and computational informatics to mine social media data. As shown in our study, the synergistic use of public health frameworks with computer science tools uncovered nuanced portrayals about PrEP. This includes PrEP’s evolution from medical novelty into a cultural phenomenon in addition to controversies that may be weaponized via misinformation campaigns.

Collectively these findings suggest that data derived from social media can provide a more comprehensive portrait of PrEP and other medical interventions more broadly. However, limited understanding of all available tools and how to best apply them may create uncertainty about the validity of these data. In many scientific disciplines, including public health, social media data bears an unfortunate reputation for being unreliable and non-efficacious. Mistrust of social media data may stem from the radical departure of social media analytics from traditional quantitative (i.e., multiple regression) and qualitative (i.e., focus-groups, interviews) analyses; or the secondary data collection nature of scraping social media data that may be days, months, and in some cases, years old. However, the uniqueness of social media data necessitates methods to facilitate extraction, analysis, and synthesis of individual posts, timelines, and collections of timelines. Inherently, these methods must also understand the time-variant nature of social media and how that data contributes to a larger narrative, regardless of the data’s age. Refined algorithms in Computer Science, and other similarly technical fields, have afforded the opportunity to visualize the scope, scale, and precision of social media data, including methods undertaken herein. Indeed, the combined S-BERT, UMAP, and K-means approach can group similar tweets in a corpus into clusters that represent pockets of dialogue that, when mapped by vector representation, illustrate the semantic and contextual ways medical necessities are communicated online.

This study has timely implications for computational informatics and public health. From an informatics perspective, this study contributes to a body of research on social media discourse and misinformation. From a public health perspective there are numerous implications associated with our findings, including how these data can be leveraged even further for interventions. First, our analysis identified pockets of possible misinformation (and how it may impact PrEP uptake and maintenance). These bodies of tweets capitalized on controversies associated with Gilead science, possibly creating concerns about Truvada’s safety, effectiveness, and unintended consequences of PrEP use. We also identified PrEP uptake as a form of judgement among people with differing views, for example, MSM PrEP users versus non PrEP users and conflicts between them. This is not the only instance of communicative tensions online regarding medical interventions— COVID-19 vaccination status has played out similarly. While these insights are, by themselves, informative, we can increase granularity by leveraging metadata associated with tweets and clusters of tweets. Indeed, by leveraging metadata, we can determine whether tweets sharing information about Truvada’s adverse effects come from social media users or bot accounts, defined as automated accounts that are not operated by humans. If these tweets originate from individual users, then it is possible to mine individual timelines to determine personal characteristics about that user, including markers of mental illness and other patterns of social media behavior that may be facilitating problematic online posting habits. We can also examine the relative popularity of these tweets determined by frequency in which they are shared among social networks. From a public health standpoint, these represent potential intervention targets– namely groups that may be misinformed about Truvada– to promote accurate PrEP information.

Ethical Considerations for Mining Social Media Data Among Vulnerable Populations

Although social media analysis represents an important potential resource for developing insight into attitudes and behaviors relating to public health concerns, caution must be urged when undertaking such investigations. Indeed, ethical concerns related to social media mining, including consent, exploiting data, and unknowingly scraping personal social media feeds represent persistent challenges in this area of research[45]. These challenges are particularly noteworthy for at-risk and vulnerable populations. Indeed, we must remember that the sero-status of individuals who are at-risk or currently living with HIV represents a deeply personal and sensitive characteristic. As such, studies relating to PrEP interventions, and for studies involving vulnerable populations more broadly, the utmost care with ensuring the anonymity and security of personal data pertaining to vulnerable populations must be observed. This study adhered closely to ethical principals for social media mining, including anonymized data and deleting all personally identifiable account information. We highly encourage the adoption of these practices for any study involving social media data related to sensitive topics, including HIV dialogue. We also strongly discourage the analysis of a select, or few, number of tweets/accounts with deep learning tools, as more data typically afford greater degrees of anonymity and accuracy of post classification.

Limitations

This work is subject to limitations we hope to address in future research. First, our study was limited by Twitter’s API, which allows users to only collect 1% of total tweet volume for a given search query. This threshold resulted in a relatively small sample of N = 4,020 tweets, which were likely intended for a highly specific population (i.e., men who have sex with men, people who inject drugs, or other persons at risk of contracting HIV). As a consequence of the limited populations for which PrEP is intended, it is likely there was less social media discourse relative to other medical interventions intended for the general population such as the COVID-19 vaccine. Second, our study relied on tweets matching a limited set of PrEP-related search terms, thus excluding a wide range of tweets that may be relevant to PrEP but did not contain these specific terms. In other words, our tweet sample inclusion criterion focused on high precision, including tweets in our sample exactly matched a small set of PrEP-relevant terms, but likely yielded low recall. The outcomes of this analysis however can point to more sophisticated methods that identify a wider range of PrEP-relevant tweets and may do so in a manner that adapts to the changing landscape of online communities of interest. Third, our study was also limited given that we did not perform a full qualitative analysis of tweet content. Future studies should consider validating our findings by conducting a full qualitative analysis of this corpus.

Conclusions

Our findings demonstrate that leveraging interdisciplinary collaboration between computational informatics and public health can provide insight into discourse surrounding complex issues such as PrEP. Social media data contains a wealth of information regarding public attitudes towards health issues but extracting the nuances of these narratives requires the analysis of large amounts of unstructured data. By developing a research framework utilizing deep learning neural networks and pattern recognition tools to prepare data for qualitative analysis grounded in public health research, we were able to distill large data corpora into more coherent topical groupings for exploratory interpretation. The findings of this study indicate a need for deeper analysis into PrEP discourse on social media, as well as an opportunity to extend our research framework towards better understanding other public health issues.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (566.3KB, png)

Author contributions

AE collected/analyzed the data and wrote technical aspects of the manuscript; DV coded data and wrote the initial draft of the manuscript; EWB provided expert content support and reviewed/edited drafts of the manuscript; and JB conceptualized the study, oversaw the execution of analyses, and edited the manuscript.

Funding

AE was partially funded by the National Science Foundation NRT grant 1,735,095, “Interdisciplinary Training in Complex Networks and Systems.“ Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Declaration

Conflict of interest

Not Applicable.

Ethical Review

This study was exempt by the Institutional Review Board (IRB) given the secondary nature of data collection and analysis.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.McCormack S, Dunn DT, Desai M, Dolling DI, Gafos M, Gilson R, et al. Pre-exposure prophylaxis to prevent the acquisition of HIV-1 infection (PROUD): effectiveness results from the pilot phase of a pragmatic open-label randomised trial. The Lancet. 2016;387(2):53–60. doi: 10.1016/S0140-6736(15)00056-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Riddell JIV, Amico KR, Mayer KH. HIV Preexposure Prophylaxis: A Review. JAMA. 2018;27(12):1261–8. doi: 10.1001/jama.2018.1917. [DOI] [PubMed] [Google Scholar]
  • 3.Volk JE, Marcus JL, Phengrasamy T, Blechinger D, Nguyen DP, Follansbee S, et al. No New HIV Infections With Increasing Use of HIV Preexposure Prophylaxis in a Clinical Practice Setting. Clin Infect Dis Off Publ Infect Dis Soc Am. 2015;15(10):1601–3. doi: 10.1093/cid/civ778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Huang YLA, Tao G, Smith DK, Hoover KW. Persistence With Human Immunodeficiency Virus Pre-exposure Prophylaxis in the United States, 2012–2017. Clin Infect Dis. 2021 Feb 1;72(3):379–85. [DOI] [PubMed]
  • 5.Patel VV, Ginsburg Z, Golub SA, Horvath KJ, Rios N, Mayer KH, et al. Empowering With PrEP (E-PrEP), a Peer-Led Social Media–Based Intervention to Facilitate HIV Preexposure Prophylaxis Adoption Among Young Black and Latinx Gay and Bisexual Men: Protocol for a Cluster Randomized Controlled Trial. JMIR Res Protoc. 2018 Aug 28;7(8):e11375. [DOI] [PMC free article] [PubMed]
  • 6.Dehlin JM, Stillwagon R, Pickett J, Keene L, Schneider JA. #PrEP4Love: An Evaluation of a Sex-Positive HIV Prevention Campaign. JMIR Public health Surveill. 2019 Jun 17;5(2):e12822. [DOI] [PMC free article] [PubMed]
  • 7.Keene L, Dehlin J, Pickett J, Berringer K, Little I, Tsang A, et al. #PrEP4Love: success and stigma following release of the first sex-positive PrEP public health campaign. Cult Health Sex. 2020 Mar 26;23. [DOI] [PubMed]
  • 8.Walsh-Buhi E, Houghton RF, Lange C, Hockensmith R, Ferrand J, Martinez L. Pre-exposure Prophylaxis (PrEP) Information on Instagram: Content Analysis. JMIR Public health Surveill. 2021 Jul 27;7(7):e23876. [DOI] [PMC free article] [PubMed]
  • 9.AIDSVu. Mapping PrEP: First Ever Data on PrEP Users Across the U.S. [Internet]. AIDSVu. 2018 [cited 2021 Nov 14]. Available from: https://aidsvu.org/prep/.
  • 10.Grov C, Westmoreland DA, D’Angelo AB, Pantalone DW. How Has HIV Pre-Exposure Prophylaxis (PrEP) Changed Sex? A Review of Research in a New Era of Bio-behavioral HIV Prevention. J Sex Res. 2021 Sep 2;58(7):891–913. [DOI] [PMC free article] [PubMed]
  • 11.Jaidka K, Giorgi S, Schwartz HA, Kern ML, Ungar LH, Eichstaedt JC. Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods. Proc Natl Acad Sci. 2020;117(19):10165–71. doi: 10.1073/pnas.1906364117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bathina KC, ten Thij M, Lorenzo-Luaces L, Rutter LA, Bollen J. Individuals with depression express more distorted thinking on social media. Nat Hum Behav. 2021;11:1–9. doi: 10.1038/s41562-021-01050-7. [DOI] [PubMed] [Google Scholar]
  • 13.ten Thij M, Bathina K, Rutter LA, Lorenzo-Luaces L, van de Leemput IA, Scheffer M, et al. Depression alters the circadian pattern of online activity. Sci Rep. 2020;14(1):17272. doi: 10.1038/s41598-020-74314-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bollen J, Gonçalves B, van de Leemput I, Ruan G. The happiness paradox: your friends are happier than you. EPJ Data Sci. 2017;18(1):4. doi: 10.1140/epjds/s13688-017-0100-1. [DOI] [Google Scholar]
  • 15.Valdez D, ten Thij M, Bathina K, Rutter LA, Bollen J. Social Media Insights Into US Mental Health During the COVID-19 Pandemic: Longitudinal Analysis of Twitter Data. J Med Internet Res. 2020;22(12):e21418. doi: 10.2196/21418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Valdez D, Picket AC, Young BR, Golden S. On Mining Words: The Utility of Topic Models in Health Education Research and Practice. Health Promot Pract. 2021 May 1;22(3):309–12. [DOI] [PubMed]
  • 17.Valdez D, Patterson M, Prochnow T. The importance of interdisciplinary frameworks in social media mining: An exploratory approach between Computational informatics and Social Network Analysis (SNA). Health Behav Res [Internet]. 2021 Aug 19;4(2). Available from: https://newprairiepress.org/hbr/vol4/iss2/4.
  • 18.Karisani P, Karisani N. Semi-Supervised Text Classification via Self-Pretraining. ArXiv210915300 Cs [Internet]. 2021 Sep 30 [cited 2021 Nov 15]; Available from: http://arxiv.org/abs/2109.15300.
  • 19.Roshanzamir A, Aghajan H, Soleymani Baghshah M. Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech. BMC Med Inform Decis Mak. 2021 Mar 9;21(1):92. [DOI] [PMC free article] [PubMed]
  • 20.Alfeo AL, Cimino MGCA, Vaglini G. Technological troubleshooting based on sentence embedding with deep transformers. J Intell Manuf. 2021 Aug 1;32(6):1699–710.
  • 21.Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. ArXiv190810084 Cs [Internet]. 2019 Aug 27 [cited 2021 Nov 15]; Available from: http://arxiv.org/abs/1908.10084.
  • 22.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All you Need. In: Advances in Neural Information Processing Systems [Internet]. Curran Associates, Inc.; 2017 [cited 2021 Nov 15]. Available from: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  • 23.Ali M, Borgo R, Jones MW. Concurrent time-series selections using deep learning and dimension reduction. Knowl-Based Syst. 2021 Dec 5;233:107507.
  • 24.Diaz-Papkovich A, Anderson-Trocmé L, Gravel S. A review of UMAP in population genetics. J Hum Genet. 2021;66(1):85–91. doi: 10.1038/s10038-020-00851-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans R Soc Math Phys Eng Sci 2016 Apr. 2065;13:20150202. doi: 10.1098/rsta.2015.0202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Likas A, Vlassis N, Verbeek J. The global k-means clustering algorithm. Pattern Recognit. 2003;36(2):451–61. doi: 10.1016/S0031-3203(02)00060-2. [DOI] [Google Scholar]
  • 27.Eakle R, Venter F, Rees H. Pre-exposure prophylaxis (PrEP) in an era of stalled HIV prevention: Can it change the game? Retrovirology. 2018 Apr 2;15(1):29. [DOI] [PMC free article] [PubMed]
  • 28.Kudrati SZ, Hayashi K, Taggart T. Social, Media & PrEP: A Systematic Review of Social Media Campaigns to Increase PrEP Awareness & Uptake Among Young Black and Latinx MSM and Women. AIDS Behav [Internet]. 2021 May 3 [cited 2021 Nov 15]; Available from: 10.1007/s10461-021-03287-9. [DOI] [PMC free article] [PubMed]
  • 29.Cougnon LA, de Viron L. Covid-19 and social media: a diachronic discourse analysis for the modeling of linguistic patterns during crises. In 2020 [cited 2021 Nov 15]. Available from: https://dial.uclouvain.be/pr/boreal/object/boreal:235420.
  • 30.García-Iglesias J. “PrEP is like an adult using floaties”: meanings and new identities of PrEP among a niche sample of gay men. Cult Health Sex. 2020 Oct 1;1–14. [DOI] [PubMed]
  • 31.Medina MM, Crowley C, Montgomery MC, Tributino A, Almonte A, Sowemimo-Coker G, et al. Disclosure of HIV serostatus and pre-exposure prophylaxis use on internet hookup sites among men who have sex with men. AIDS Behav. 2019;23(7):1681–8. doi: 10.1007/s10461-018-2286-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Scott HM, Klausner JD. Sexually transmitted infections and pre-exposure prophylaxis: challenges and opportunities among men who have sex with men in the US. AIDS Res Ther. 2016 Jan 19;13:5. [DOI] [PMC free article] [PubMed]
  • 33.Celum C, Baeten J. PrEP for HIV Prevention: Evidence, Global Scale-up, and Emerging Options. Cell Host Microbe. 2020;27(4):502–6. doi: 10.1016/j.chom.2020.03.020. [DOI] [PubMed] [Google Scholar]
  • 34.van Dijk M, de Wit JBF, Guadamuz TE, Martinez JE, Jonas KJ. Slow Uptake of PrEP: Behavioral Predictors and the Influence of Price on PrEP Uptake Among MSM with a High Interest in PrEP. AIDS Behav. 2021 Aug 1;25(8):2382–90. [DOI] [PMC free article] [PubMed]
  • 35.Hannaford A, Lipshie-Williams M, Starrels JL, Arnsten JH, Rizzuto J, Cohen P, et al. The Use of Online Posts to Identify Barriers to and Facilitators of HIV Pre-exposure Prophylaxis (PrEP) Among Men Who Have Sex with Men: A Comparison to a Systematic Review of the Peer-Reviewed Literature. AIDS Behav. 2018;22(4):1080–95. doi: 10.1007/s10461-017-2011-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Assaf RD, Konda KA, Torres TS, Vega-Ramirez EH, Elorreaga OA, Diaz-Sosa D, et al. Are men who have sex with men at higher risk for HIV in Latin America more aware of PrEP? PLOS ONE. 2021 Aug 13;16(8):e0255557. [DOI] [PMC free article] [PubMed]
  • 37.Mugo NR, Ngure K, Kiragu M, Irungu E, Kilonzo N. PrEP for Africa: What we have learnt and what is needed to move to program implementation. Curr Opin HIV AIDS. 2016;11(1):80–6. doi: 10.1097/COH.0000000000000224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zablotska I, Grulich AE, Phanuphak N, Anand T, Janyam S, Poonkasetwattana M, et al. PrEP implementation in the Asia-Pacific region: opportunities, implementation and barriers. J Int AIDS Soc. 2016 Oct 18;19(7Suppl 6):21119. [DOI] [PMC free article] [PubMed]
  • 39.Mayer KH, Agwu A, Malebranche D. Barriers to the Wider Use of Pre-exposure Prophylaxis in the United States: A Narrative Review. Adv Ther. 2020 May 1;37(5):1778–811. [DOI] [PMC free article] [PubMed]
  • 40.D’Angelo AB, Westmoreland DA, Carneiro PB, Johnson J, Grov C. Why Are Patients Switching from Tenofovir Disoproxil Fumarate/Emtricitabine (Truvada) to Tenofovir Alafenamide/Emtricitabine (Descovy) for Pre-Exposure Prophylaxis? AIDS Patient Care STDs. 2021 Aug 1;35(8):327–34. [DOI] [PMC free article] [PubMed]
  • 41.Chan L, Asriel B, Eaton EF, Wyatt CM. Potential Kidney Toxicity from the Antiviral Drug Tenofovir: New Indications, New Formulations, and a New Prodrug. Curr Opin Nephrol Hypertens. 2018;27(2):102–12. doi: 10.1097/MNH.0000000000000392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ddaaki W, Strömdahl S, Yeh PT, Rosen JG, Jackson J, Nakyanjo N, et al. Qualitative Assessment of Barriers and Facilitators of PrEP Use Before and After Rollout of a PrEP Program for Priority Populations in South-central Uganda. AIDS Behav. 2021;25(1):3547–62. doi: 10.1007/s10461-021-03360-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Thomann M, Grosso A, Zapata R, Chiasson MA. ‘WTF is PrEP?’: attitudes towards pre-exposure prophylaxis among men who have sex with men and transgender women in New York City. Cult Health Sex. 2018;3(7):772–86. doi: 10.1080/13691058.2017.1380230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Liu Y, Yu K, Wu X, Qing L, Peng Y. Analysis and Detection of Health-Related Misinformation on Chinese Social Media. IEEE Access. 2019;7:154480–9. doi: 10.1109/ACCESS.2019.2946624. [DOI] [Google Scholar]
  • 45.Norval C, Henderson T. Contextual Consent. Ethical Mining of Social Media for Health Research [Internet]. arXiv; 2017 [cited 2022 May 15]. Available from: http://arxiv.org/abs/1701.07765.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (566.3KB, png)

Articles from AIDS and Behavior are provided here courtesy of Nature Publishing Group

RESOURCES