Highlights
-
•
We investigate the characteristics of the authors of Tweets containing suicidal intent or thinking, through the analysis of their online social network relationships and interactions.
-
•
Results show a high degree of reciprocal connectivity between the authors of suicidal content when compared to other studies of Twitter users, suggesting a tightly-coupled virtual community.
-
•
Analysis of the retweet graph identified bridge nodes and hub nodes connecting users posting suicidal ideation with users who were not, suggesting a potential for information cascade and risk of possible ‘contagion’.
-
•
Retweet graphs of suicidal content exhibit an average shortest path similar to that of a large comparison network, demonstrating large scale information propagation in small-scale networks.
Keywords: Social media, Social network analysis, Twitter, Computational social science, Suicide
Abstract
In this paper we aim to understand the connectivity and communication characteristics of Twitter users who post content subsequently classified by human annotators as containing possible suicidal intent or thinking, commonly referred to as suicidal ideation. We achieve this understanding by analysing the characteristics of their social networks. Starting from a set of human annotated Tweets we retrieved the authors’ followers and friends lists, and identified users who retweeted the suicidal content. We subsequently built the social network graphs. Our results show a high degree of reciprocal connectivity between the authors of suicidal content when compared to other studies of Twitter users, suggesting a tightly-coupled virtual community. In addition, an analysis of the retweet graph has identified bridge nodes and hub nodes connecting users posting suicidal ideation with users who were not, thus suggesting a potential for information cascade and risk of a possible contagion effect. This is particularly emphasised by considering the combined graph merging friendship and retweeting links.
1. Introduction
It is recognised that media reporting about suicide cases has been associated with suicidal behaviour [1]. Concerns have been raised about how media communication may have an influence on suicidal ideation and cause a contagion effect among vulnerable subjects [2]. With the advent of open and massively popular social networking and microblogging Web sites, such as Facebook, Tumblr and Twitter (frequently referred to as social media), attention has focused on how these new modes of communication may become a new, highly interconnected forum for collective communication of suicidal ideation on a large scale. The demographic of online social networks is typically reported to be the younger generation [3], [4] and thus teenagers and young adults are at particular risk. The risk of suicide contagion has been found to be especially high in adolescence and youth [5].
A limited number of studies have been published, reporting a positive correlation between suicide rates and the volume of social media posts that may be related to suicidal ideation and intent [6], [7]. However, to date there is no study that is specifically focused on the connectivity and communication of suicidal ideation between users of social media. Such a study could be important in the light of concern about the normalisation of suicidality and self-harm in social media. There is a small evidence base that suggests a connection between exposure to online self-harm- or suicide-related material and offline self-harming behaviour or suicidal ideation [3].
The research presented in this paper comprises an analysis of data collected from the microblogging website Twitter, the text of which has been classified as containing suicidal ideation by a crowdsourced team of human annotators. We study the connectivity characteristics between users and the propagation of suicidal content. To achieve this we have performed a social network analysis (SNA) of the connections of a specific subset of Twitter users who have been identified as posting content related to suicidal ideation. The SNA is applied to friend and follower connections of the subset of users, as well as investigating the potential content propagation by analysing the retweet graph of posts containing suicidal ideation. More specifically we are addressing the following research questions:
RQ1: With respect to the friends-followers and mutual graphs we focus on measures of graph connectivity to determine whether there is evidence of high connectivity between these specific type of ‘suicidal’ users, or whether these users are instead more isolated and exist within smaller social networks, as reported in [8], [9]. Evidence that would allow us to partially answer this question is expected to be revealed by measurable network characteristics such as ‘average node degree’, ‘graph density’ ‘and ‘shortest path lengths’.
RQ2: Regarding the retweet graph, we would expect traditional connectivity metrics to be less revealing as we do not have a complete network of all social ties (friends/followers) between retweeters. This is primarily because we only collected retweets for the sample set of ‘suicidal users’, due to the long time it would take to collect all users given the frequency/time limitations imposed by Twitter. Nevertheless, we can measure the shortest path metrics, which are a measure of information cascade. High values of average and maximal average shortest path imply greater propagation of information though the network. In addition, starting from an individual belonging to the set of ‘suicidal’ users, we can investigate if there is any evidence of social ties between these users and the Twitter users that have retweeted their posts. Evidence of this nature would allow us to gain insight into whether suicidal content is being restricted within the same community of friends and followers, or if it is propagating outside the user’s social community into the wider network, where it could pose a risk of contagion.
The remainder of the paper is organised as follows. Section 2 describes the related work on this topic. Section 3 describes the data collection method. Sections 4 and 5 describe experiments used to measure connectivity and communication between suicidal users, and discuss the findings. Sections 6 and 7 draw conclusions from the study and identify possible ideas for future work.
2. Related work
A number of studies have recognised evidence that vulnerable subjects can be susceptible to the influence of news and reports of suicide in traditional mass media. The research literature on suicide clusters has supported the link between media reporting and suicide contagion and the impact of fictional and non fictional news stories of suicide [1], [10]. There have also been recommendations for journalists about news reporting with particular emphasis on the language used in specific parts of a report, for example the headlines, and the differences between reports with national or local coverage [2].
In terms of the social network of groups of at risk subjects the majority of studies derive from medical research. For instance, in [11] the authors posed questions focused on social interactions in a poll of in-patients after a suicidal attempt, studying primarily the satisfaction level of social relationships reported by students and the unemployed. In [8] the authors conducted a similar study by investigating the relationship between friendships and suicidality among a larger sample of male and female adolescents in the US. Both studies came to the conclusion that an evaluation of the social network should be an integral part of the clinical investigation of suicidal related patients and form a basis for intervention. Furthermore, these studies provide motivation for the research presented in this paper.
However, only a small number of scientific articles have focused on the impact of social media communication. For example, in [6] the authors studied the potential of this new medium for predicting suicides by testing two social media variables (i.e. suicide-related weblog entries) over a period of three years, observing a positive correlation with suicide frequency. In [7] the authors conducted a study in the US on a dataset collected from Twitter using keywords and phrases related to suicide risk factors, filtered geographically by US state. Again they observed a positive correlation against national data of actual suicide rates.
Other studies have focused instead on the language used for the communication of suicidal thoughts, although they have primarily investigated other forms of written communication such as the classification of suicide notes (see [12], [13]). This form of communication is typically more well-formed and less noisy than the type of short, informal language used in social media. Furthermore, the language was being expressed by people about to complete the act of suicide, rather than those expressing thoughts of suicide. In [14] the authors report on depression-related language in Facebook1. Facebook has less constraint on post length than Twitter, allowing more expressive thoughts to be posted; and we should not suggest that depression and suicidal ideation are synonymous, as they are not. Other recent studies have focused on depression and other mental health issues, highlighting the possible beneficial effects of social media communication [15], [16], [17], [18].
More recently, there has been a more direct focus on the subjects potentially at risk of suicide, for example the Durkheim project2 monitored the behavioural intent of a sample of US war veterans and analysed their social media posts on Twitter and Facebook to predict the risk of suicide ([19] and also [20]). However, none of these recent works looked specifically at the social network communication in terms of connectivity between users and propagation of suicidal ideation.
Social network connectivity has been studied by Hsiung [21] who reported the behaviour of an online mental health support group in reaction to a suicide case within the group. [22] reports how users who strongly express either positive or negative emotions heavily associate with each other, and [23] investigated the information contagion effect on a wider set of popular news stories in Twitter and Digg3. A systematic review of the research literature of Internet influences on the risk of self-harm or suicide, with particular focus on young people, is provided in [3].
Monitoring individual social media accounts to detect possible suicidal ideation is controversial territory, as evidenced by the recent withdrawal of the Samaritans Radar app in the UK4, but there is nonetheless potential to contribute to prevention as long as acceptability to social media users is thoroughly investigated. The research presented in this paper continues in this direction by focusing on Twitter as a case study for the analysis of connectivity and communication between people who post suicidal ideation. For the purposes of the paper we will refer to this subset of Twitter users as ‘suicidal users’.
3. The collection of Twitter data
In order to collect and analyse suicidal communication posted to Twitter, we first needed to identify a set of terms that were likely to identify suicidal communication within text. To do this we initially collected text from Web forums via five Web sites5, 6, 7, 8, 9 either dedicated to discussion of suicidal thoughts and feelings or containing a large and easily identifiable body of such material. This resulted in 2000 anonymised forum posts that ranged in length from a few lines to several sentences and paragraphs. Each post was human annotated using the crowd-sourcing online service Crowdflower10. Human annotators were asked to identify content containing suicidal thoughts and feelings. Following the annotation we removed any annotations that were not agreed upon by at least four crowd-workers to be indicative of such emotion.
Term Frequency-Inverse Document Frequency (TF-IDF) analysis was applied to a each dataset (suicidal/non-suicidal). This process identified the most frequent terms in each dataset that are not present in the other, thus providing a ranked list of terms that are more likely to be suicidal than not. In this study, we considered terms as n-grams of up to five tokens in length. To further penalise common phrases and words that appear in both suicidal and non-suicidal contexts, while prioritising terms belonging exclusively to the former dataset, TF-IDF was applied by considering the posts classified as non-suicidal as distinct documents, whereas those including suicidal intent were aggregated into an unique document. Examples of the most relevant trigrams and five grams produced by the TF-IDF procedure are given in Table 1.
Table 1.
Trigrams |
Fivegrams |
||
---|---|---|---|
TF-IDF | 3-gram | TF-IDF | 5-gram |
169.94 | Want to die | 32.819278 | To take my own life |
126.36 | To kill myself | 24.633562 | Want to die right now |
71.75 | To commit suicide | 22.590259 | Have nothing to live for |
68.18 | Want to kill | 19.691567 | It’s not worth it anymore |
65.64 | Can’t live | 19.691567 | Don’t want to live anymore |
61.18 | To end it | 19.691567 | Me want to kill myself |
58.3 | I’m tired of | 19.691567 | Myself hate my life hate |
54.46 | I hate myself | 19.43643 | Want to be here anymore |
53.81 | End it all | 18.475171 | Want it to be over |
47.44 | End my life | 18.475171 | Want it all to end |
36.95 | Take my own | 18.475171 | Wish could just fall asleep |
33.89 | Kill myself and | 17.612125 | Fall asleep and never wake |
32.82 | My death would | 15.933278 | Want to end it all |
32.79 | To live anymore | 13.127711 | Just really want to die |
31.87 | About killing myself | 13.127711 | Rather die its not worth |
29.73 | Kill myself i | 13.127711 | I’m sorry that im leaving |
29.73 | Never wake up | 13.127711 | Fuck trying to live normal |
28.24 | Killing myself i | 13.127711 | So why should continue living |
26.26 | Stop the pain | 13.127711 | Don’t want to live defeated |
26.26 | Kill myself right | 13.127711 | To commit suicide within few |
25.89 | Thoughts of suicide | 13.127711 | And pain anymore just can |
25.89 | Point in living | 13.127711 | Put an end to this |
24.63 | Worth it anymore | 13.127711 | Been self harming for years |
24.3 | Have nothing to | 13.127711 | Bad really am worthless what |
21.86 | Wanted to die | 13.127711 | Life is this miserable just |
Because of the significant number of irrelevant terms that would not logically be useful as search keywords for the Twitter data collection, the TF-IDF lists were subject to further examination by two experts in the suicide field leading to a list of 62 key words and phrases used to collect suicidal communication from Twitter, as shown Table 2.
Table 2.
Asleep and never wake | Just want to sleep forever | Take my own life |
Can’t do this anymore | Kill myself | Thoughts of suicide |
Could just fall asleep | Killing myself | Tired of being alone |
Die in my sleep | Life is so meaningless | Tired of being lonely |
Don’t want to be here | Life is too hard | To end this nightmare |
Don’t want to exist | Life is worthless | To hurt myself |
Don’t want to go on | My death would | To live anymore |
Don’t want to live | My life consists of nothing | Want it to be over |
Don’t want to try anymore | My life is pointless | Want to be alive anymore |
Don’t want to wake up | My life is this miserable | Want to be around anymore |
End it all | My life isn’t worth | Want to be dead |
End my life | Not want to be alive | Want to be gone |
End this pain | Nothing to live for | Want to be here anymore |
Ending it all | Point in living | Want to die |
Hate my life | Put an end to this | Want to disappear |
Hate myself | Ready to die | Want to end it |
I’m drowning | Really need to die | Wanted to die |
I’m leaving now | Stop the pain | Wanting to kill yourself and |
I’m worthless | Suicidal | What is wrong with me |
Isn’t worth living | Suicide | Why should I continue living |
Just want to give up | Take it anymore |
Illustrative examples are asleep and never wake, don’t want to exist and kill myself. These search terms were then used to collect data from Twitter via the Twitter Application Programming Interface (API)11.
Twitter is a micro-blogging site with 255 million active users worldwide posting an estimate of over 500 million Tweets per day12 on an open and accessible basis. This makes Twitter a suitable source of data for a study into connectivity and propagation of suicidal ideation, but also results in an extremely noisy environment, where posts cover a large variety of topics. As a consequence, the data retrieved are required to be pre-filtered in order to consider a sufficient number of posts that can be classified as containing suicidal ideation.
Data were collected from Twitter for a six-week period starting on the 1st February 2014, resulting in over four million posts. As a parallel activity, we monitored traditional media over the same period to identify the names of suicidal cases of young people in England (focusing on the teenage range of 11–18 years old) and then searched and retrieved data from Twitter containing the name and surname of the deceased. Using the ’names’ dataset, 2 expert suicide researchers discussed the features of the Tweets and derived a coding frame concerning not only suicidal thinking and ideation (also including expressions of total despair, even if suicide is not explicitly mentioned) but also memorials, campaigning, information and support, and news reporting. The following seven-class coding frame was developed by these researchers to capture the best representation of how people generally communicate on the topic of suicide.
-
•
1: Evidence of possible suicidal intent
-
•
2: Campaigning (i.e. petitions etc.)
-
•
3: Flippant reference to suicide
-
•
4: Information or support
-
•
5: Memorial or condolence
-
•
6: Reporting news of someones suicide (not bombing)
-
•
7: None of the above
We then extracted a random sample of 1000 tweets from the 4 million collected over a six-week period and repeated the human annotation task using the same crowdsourcing service13, this time asking crowd-workers to classify Tweets into a number of suicide related categories. The reason for selecting a sample of 1000 is that human annotation is a manual and time-intensive task. Similar research into the classification of emotive texts using a human annotated gold-standard has typically used a sample of 1000 to good effect [24], [25], [26], [27].
Our main interest was in the first class of posts containing evidence of possible suicidal intent. As may be expected, this particular type of content is present in Twitter only in a small minority of posts. Following the second human annotation task we removed all Tweets that had less than 75% agreement among crowd-workers and obtained a set of 71 posts classified into this first class (11.8% of a total of 601 with at least 75% agreement among human annotators).
To extend the datasets of Tweets on which to perform our analysis, we also considered any duplicates (Tweets with exactly the same text) of the initial set of 71 that were contained into the whole six-week collection of pre-filtered Tweets. This resulted in a total of 4543 posts that constitute our final dataset of Tweets (human) classified as containing possible evidence of suicidal intent. The distribution of the duplicates is shown in Fig. 1 showing how the majority of Tweets included into the initial set had only a small number (in the order of units) of exact copies of the same text out of the whole datasets, while only a handful of them had more than a few hundred. We define the whole set of authors of these posts as the set S (or ‘suicidal’ set) throughout the paper, for a total of 3535 Twitter users posting this type of content.
Finally, for each Tweet in the resulting set of 4543, we collected all retweets contained in the whole six week dataset. We identified retweets following a pattern recognition technique that extracted them out of the whole six weeks collection as any post matching the following format: ’RT ’+ space + ’@screenname’ + space + ’:’ + ’Tweet text’ + ’some more text (if any)’. This resulted in 2365 retweets, for which Fig. 2 illustrates the distribution, showing long-tail characteristics where the majority of tweets have very few retweets, but a small number of them have been widely propagated.
4. The friends and followers distributions - measures of connectivity
For each of the authors of the 4543 Tweets classified as containing evidence of possible suicidal intent we retrieved Twitter profile information pertaining to the lists of followers and friends (users followed) so that we could identify measures of connectivity between this type of user. This resulted in two very large sets of 2,376,559 followers and 1,600,498 friends for a list of 3535 distinct authors.
The graph of followers is a directed graph (with the out-going edges meaning a is followed by relation). Our data show an average number of followers of 528 per user, which is more than double the Twitter average of 208.14 This would suggest a higher than average level of ‘social capital’ within the ‘suicidal’ users in the set S, where ‘social capital’ is a measure of how many people are likely to receive information from the user. Celebrities and politicians typically have high levels of Twitter social capital. The survival (1-cumulative) distribution of followers mirrors the characteristics reported in other studies of follower distributions [28], [29], as visible in Fig. 3.
We also computed the distribution of ‘friends’ (users followed) and a ‘mutual’ list of users that reciprocally follow each other. Having a ‘following’ relationship with many users who post suicidal content could be interpreted as being a ‘consumer’ of such content, while a mutual connection could suggest mutual interest in sending and receiving content. The resulting averages per user were 372 and 313 respectively for ‘friendship’ and ‘mutual’ links with statistical distributions similar in their long-tail shape to the one obtained for the followers lists (here omitted for reasons of space).
The list of friends and followers presented so far refers to the aggregate of all the friends/followers returned by the Twitter API15 for each of the set of ‘suicidal’ users. Note that the users in these lists were not necessarily expected to belong to the initial set S. However, we were interested in the degree to which this occurs, to establish if there are mutual friendship relationships between users posting suicidal content. This can provide evidence of communities existing around this topic. Fig. 4 confirms that there is indeed a level of reciprocal friendship between users posting suicidal ideation, as evidenced by the survival (the reciprocal of cumulative) distribution. Although it stills follows a long-tail distribution, with the vast majority of users having a small number of links, a notable percentage of users (about 20%) appear to have links with other ‘suicidal’ users.
4.1. Graph representation of friends and followers
Following our identification of some level of connectivity between suicidal users, we proceeded to build graph representations of followers, friends and mutual friends. Here nodes represent users that belong exclusively to the set S of 3535 ‘suicidal’ users and edges the ‘follow’, ‘friendship’ (directed) and ‘mutual’ (undirected) links between pairs of users included in this particular class. Figs. 5–7 shows the graph representation of the followers graph resulting in 833 nodes and 273 edges, having here discarded users that did not have any follower connection within S.
Fig. 5 shows a very sparse graph with many small disconnected sub-graphs visible in the outer circle. However, also visible is a core of nodes that appear connected via a follower relationship. The core of this network is expanded in Fig. 6. In this figure the nodes’ sizes and colours follow a scale according to their degree representing the is followed by relation. The nodes range from red to blue, where red nodes have many followers (more followers = larger node size) and blue nodes have less or no followers but are following the most people. Similarly red edges represent the is followed by relationship and blue edges represent follows. Here we can observe the presence of large red nodes that have a function of ‘hubs’ in the graph being connected with (‘followed by’) several other nodes (see also the graph detail in Fig. 6). These nodes could be seen as influential users within the community, having high social capital and the potential to communicate with a wide range of other suicidal users.
Fig. 7 shows a ‘close up’ of one of these hubs. Note that the large size of the node implies the existence of a considerably large set of followers. Moreover, we can observe that this followers set includes other red and orange nodes of considerable size themselves, that in turn have a number of their own followers. This can produce high potential for the spread (cascade) of information over the network.
Nodes in between the red and blue range (in the order of orange, light yellow and light green nodes) can be seen instead as intermediate nodes having both followers and following other nodes (in different proportions following the colour order). They then form potential communication bridges among different communities (see 6). Connecting two communities is therefore likely to support contagion between groups.
Table 3 summarises a number of metrics for the following three graphs of followers, friends and mutual connections. These results provide the statistics for:
-
•
Number of nodes: The number of vertices in the graph.
-
•
Number of edges: The number of links connecting pairs of vertices.
-
•
Graph density: The ratio between the number of edges in the graph and the total number of possible edges.
-
•
Average graph degree: For each vertex the degree is calculated as the number of links that end in that vertex. For the directed graph such as the followers and friends we have calculated the out degree (number of outgoing edges) representing respectively the ‘is followed by’ and ‘is following’ relations. The average degree computes the average of the degree values over all network nodes.
-
•
Max graph degree: The maximum value of the nodes degree over all graph vertices.
-
•
Number of connected components: The number of sub-graphs for which any two vertices are connected to each other by edges.
-
•
Largest connected component (LCC): The maximum size (number of nodes) of a connected sub-graph.
-
•
Average clustering coefficient: Firstly we calculate the clustering coefficient for each node as the probability that two randomly chosen distinct neighbours of the given node are connected. This is also referred to as the local clustering coefficient for a node. Then we average these values over all network nodes.
-
•
Number of triangles. Number of triples of nodes all connected pairwise by an edge.
-
•
Transitivity. This is another global measure of clustering and is proportional to the ratio between the total number of triangles and the number of connected triples of vertices (groups of three nodes with at least two edges connecting pairs of them).
-
•
Average shortest path. We firstly defined the shortest path length between two nodes as the number of edges (hops) that we need to travel through to connect one to the other. This is equal to one when nodes are linked directly by an edge, and higher if there are any intermediate nodes and edges that connect the two extremes represented by the given pair. We then compute the shortest value when more than one of such paths exist. For a node the average shortest path is then defined as the average of the shortest path values between the given node and all others in the graph.
-
•
Maximum shortest path. The maximum value of the shortest path calculated over all pairs of vertices in the graph. This is also referred to as the diameter of the graph.
Table 3.
Metric | Foll. | Fr. | Mut. |
---|---|---|---|
|Nodes| | 833 | 863 | 607 |
|Edges| | 1273 | 1423 | 958 |
Density | 3.7E03 | 3.8E03 | 5.2E03 |
|Conn| | 172 | 161 | 92 |
LCC | 377 | 435 | 352 |
Avg. Deg. | 3.06 | 3.30 | 3.16 |
Max. Deg. | 53 | 59 | 53 |
Avg Clust. | 0.063 | 0.082 | 0.062 |
|Triang,| | 1869 | 3150 | 1401 |
Trans. | 0.14 | 0.18 | 0.13 |
Avg. sh. | 4.79 | 4.99 | 4.93 |
Diameter | 14 | 16 | 15 |
A mathematical formulation of all the metrics listed above can be found in [30]. All above metrics aim to measure how nodes are linked to each other and, consequently, how they can potentially disseminate content from a node to its neighbouring nodes (friends, followers), and from them to their own neighbours and so on. More specifically:
-
•
Degree (avg, max) and density are essentially measures of graph connectivity in terms of links/relations between nodes. This, in terms of follower/following degrees, means that users can directly consume (see, read) the content posted by other users.
-
•
Average clustering coefficient and transitivity are both clustering metrics that measure how some of the nodes can form dense groups in which each element has strong connections with the others. As a consequence, each piece of information posted by one of these nodes can rapidly spread within the groups but disseminates outside the group with more difficulty. Note that if the graph nodes were all connected to each other we would have only one big cluster (this is also expressed by high density values that can then be seen as a measure of ‘global clustering’). However, usually (as in our graphs) a number of finite clusters are visible, normally having weak connections between each other (weak ties). If no connections at all exist between clusters we would define them as disconnected components. When many nodes are included in one of these clusters the average clustering degree values become higher - even if the graph appears composed by many distinct clusters.
-
•
Shortest paths metrics are a direct measure of how information travels throughout the network, following paths represented by links between a node and his neighbours, between them and their own networks, and so on. The greater the length of the shortest paths from a node to all others in the graph (and so their average), the easier the information can travel from a given node and spread over the network. The flow of information spreads with increasing difficulty beyond the edge of the connected components and clusters of nodes. However, as observed earlier, clusters could still be connected by a small number of links (weak ties [31]) that act then as bridges between cluster pairs and allow information to spread form a vertex to the others leading to a possible contagion effect (this is reflected by greater values of each node shortest paths to all other network nodes).
From the values in Table 3 we can observe that the graphs representing the followers and friends networks are very similar, with the latter having slightly greater degrees and clustering indexes (e.g. average degree, average clustering). This is also reflected in the higher number of triangles and greater transitivity, meaning a slightly more connected graph.
Secondly, we can observe that the graph built with mutually reciprocated links shows very similar values for the majority of the metrics of connectivity, such as maximum and average node degree, clustering coefficients, average shortest path, diameter, and even higher graph density (see Table 3).
For baseline comparison of social network metrics we refer to three datasets publicly available from the website Konect [30] (the Koblenz Network Collection), which provides large network datasets for scientific research. We will refer to these as ‘baseline network metrics’. In Table 4 we provide network metrics (when available) for the three following datasets of different sizes (all representing Twitter follower networks):
k1 - Twitter (ICWSM): directed network containing information about who follows whom on Twitter.
k2 - Twitter (MPI): asymmetric network containing Twitter ‘follow’ data based on a snapshot taken in 2009.
k3 - Twitter (WWW): follower network from Twitter, containing 1.4 billion directed ‘follow’ edges between 41 million Twitter users.
Table 4.
Metric | k1 | k2 | k3 |
---|---|---|---|
|Nodes| | 465,017 | 52.5 m | 41.6 m |
|Edges| | 834,797 | 1.9b | 1.4b |
Density | 3.2E06 | 1.4E07 | 1.6E07 |
|Conn| | – | – | – |
LCC | 465,017 | – | – |
Avg. Deg. | 3.59 | 74.68 | 70.51 |
Max. Deg. | 678 | 3.6 m | 3.1 m |
Avg Clust. | 0.061 | – | – |
|Triang,| | 38,389 | 55.4b | 34.8b |
Trans. | – | – | – |
Avg. sh. | 4.59 | – | – |
Diameter | 8 | 18 | 23 |
Although Twitter networks of different size and nature inevitably show different characteristics, the graphs of ‘followers’, ‘friends’ and ‘mutuals’ present a density of three degrees of magnitude greater than the benchmark datasets ‘k1’, ‘k2’ and ‘k3’ used for comparison (in the order of E-03 instead of E-06). These values further drop with the increasing size of the graphs, thus suggesting that, although of generally low density, the level of interconnectivity between ‘suicidal’ users may be greater than that in these baseline networks. The opposite happens for the average degrees, suggesting instead that these users are more isolated from other users than in the baseline networks. However, the network of ‘suicidal’ users is actually relatively small compared to the baseline networks and our results show that the measures that express connectivity, such as the average degree and the average clustering coefficient, are comparable between our values and those of the smallest Konect graph k1.
A further published work also provides an analysis of the Twitter ‘follow’ graph, taking a snapshot from the second half of 2012, by defining four different networks of different size [28]. The degree of connectivity is here very similar to our results, with the range of average degrees varying from 2.83 to 3.34 for the follower graph, from 3.56 to 4.03 for the friend graph, and from 2.59 to 2.83 for the graph representing ‘mutual’ links. The distribution of clustering coefficients is also comparable with our findings (0.19 for nodes of degree 20). This again suggests that the connectivity within the suicidal user set is similar to the generic Twitter network connectivity. This study also reports an average path length of 4.17 for the ‘mutual’ graph and 4.05 for the directed graph of followers for the networks, while we obtain values of 4.79 for the followers and 4.93 for the ‘mutual’ links, providing further evidence of a connectivity among suicidal users which is comparable to that of generic Twitter users.
Moreover, the authors report that 42% of edges in the ‘follow’ graph are reciprocated, whereas our graphs return much higher percentages with 75 of the ‘follow’ links also having ‘friendship’ links between the two nodes. This result is in line with other recent studies that have identified in large networks the presence of sub-communities of members highly associated to each other. Furthermore, the same studies suggest this may be correlated to the high emotional state of these members, such is the case of our network of ‘suicidal’ users that forms itself a sub-community of the much larger Twitter network.
Nevertheless, the fact of recording a degree of connectivity comparable to that of other snapshots of more generic Twitter users in terms of social network metrics (apart some predictable differences from the largest graphs of several million of users) is an important result itself. In fact, our network is formed exclusively by users belonging to the ‘suicidal’ set (having discarded any ‘follow’ and ‘friendship’ links with nodes outside this given set) and has been generated by only considering the authors of a very small sample of distinct Twitter posts (originally less than one hundred annotated as ‘suicidal’ and then expanded by considering their duplicates in the collected data). As a consequence no particularly significant degree of connectivity was expected among this resulting group of users.
5. The retweet graph - measures of communication
This section analyses the graph of retweets, built by looping through S and identifying which users have retweeted posts containing suicidal ideation. This has the effect of further propagating this type of content and may increase the risk of contagion. The retweet graph is a directed graph where the direction of the arrows means ‘has retweeted’. A summary of graph metrics related to the retweet graph is given in Table 5. Only a relatively small percentage of our initial set of users have been retweeted (), as visualised in Fig. 2 suggesting a long-tail distribution. This also means that only 32% of the nodes in the retweet graph are from the initial set S of ‘suicidal’ users.
Table 5.
Metric | Re-tw. | Re-tw+Fr. |
---|---|---|
|Nodes| | 3209 | 3866 |
|Edges| | 2211 | 3469 |
Density | 4.3E04 | 4.6E04 |
LCC | 138 | 827 |
|Conn| | 1002 | 1023 |
Avg. Deg. | 1.38 | 1.79 |
Max. Deg. | 44 | 69 |
Avg Clust. | 9.4E03 | 0.013 |
|Triang.| | 9 | 1878 |
Trans. | 1.4E03 | 0.08 |
Avg. sh. | 5.05 | 5.43 |
Diameter | 13 | 15 |
In Table 5 we can observe very low values for all the connectivity metrics (such as degree, clustering, and a much higher number of disconnected components) in comparison with those obtained from the follower and friend graphs. This is, however, a consequence of the fact that we focused intentionally only on posts included in the annotated set of human classified suicidal tweets, thus only considering retweets of this particular group of users without incorporating those who have not been identified as posting suicidal ideation. As a result, the retweet graph does not include any edges without at least one end included in the set S.
Therefore, our collection only explored retweet links going one-hop away from our initial set of users and so missing out potential triangles among triads of nodes when these were not all included in our given set (as in the majority of cases). This resulted in a reduction in the indexes of transitivity and clustering, whereas the average degree still achieves a third of the values obtained for the followers and friends networks.
However, from the analysis of metrics other than connectivity indexes we can observe interesting properties. [32] reports an extensive study of a large datasets of a 2009 snapshot of the Twitter graph analysing hundreds of thousand of users and their retweets. It concludes that, even if the retweet graph shows the same scale-free characteristics, it presents a higher degree of connectivity than typical online networks. In particular the authors observed larger connected components and higher clustering coefficients (greater than in the follower graph) resulting in a closer behaviour to real-world networks in terms of content dissemination. The latter property is captured by the values of the average shortest path (4.8) and diameter of the graph (8.5). Similar results are also reported in [33] that analysed over four thousand retweet groups (for a total of about 26,000 Tweets) collected over the year 2011. The authors obtained a maximum longest shorter path over all groups of 9 edges (although the average shortest path was much lower and only equal to 2). Our results, presented in Table 5, show higher values of both the diameter (maximum shortest path of 13/15) and average shortest path (between 5 and 5.5). This finding suggests a greater spread of suicidal ideation content than that observed for typical Twitter content in the comparable studies.
The average shortest path in our retweet graph is also in line with that reported in a public Konect dataset (5.45) which represents a much larger Twitter network of online interactions (‘mentions’), with three million nodes and over ten million edges [30]. This provides further evidence that the ‘suicidal’ user network S presents properties similar to large scale communication networks, thus suggesting a high level of propagation of such content within the virtual community and some potential for information spread (and a possible contagion effect).
The propagation of information can also be explained by looking at particulars of the retweet graph (see Fig. 8), which appears as highly disconnected (very sparse with over one thousand connected components) with most of the users only connected in small size disconnected sub-graphs usually formed by small hubs with at the centre a node ∈ S (‘suicidal’ nodes) and at the edges a small group of nodes external to S. However, the relatively high shortest path values suggest the existence of weak links/bridges that connect together different hubs.
Even if not numerous, these weak links and bridges do exist in our graph, as observable from Fig. 8. Here nodes belonging to S are represented in red while ‘external’ nodes are coloured in blue. The size of the user/node is proportional to the number of retweets for original ‘suicidal’ tweets posted by that user. We can observe a number of ‘hubs’ where the centre of the hub is a user that posted suicidal content, which has subsequently been retweeted a number of times, since these nodes appear of a considerable size. Surrounding the hub are retweeters who are (in the majority of cases) external nodes (not in S), thus allowing content dissemination outside our initial set of suicidal users. Once again, this provides evidence of a possible contagion effect. Also note the importance of a number of ‘bridge nodes’ that have retweeted (and so linked together) pairs of different hubs. In Fig. 8, edges represent the relation ‘has retweeted’. Edges between nodes external to S and internal ones are coloured in blue and appear as the large majority, whereas only few links (in red) present both ends belonging to set S (red nodes).
This is also in line with recent studies, see [34], that emphasise the importance of ‘weak-links’ within the Twitter network for the dissemination and sharing of content.
5.1. Combining friendship and retweet links
As a final step, we merged the two graphs of followers and retweeters, thus adding ‘friendship’ edges to nodes in the retweet graph as well as adding users from S that had ‘follow’ links but have not retweeted each other. The purpose of this is to identify levels of propagation between suicidal users.
The network metrics for this ‘combined’ graph are given in the second column of Table 5. Here we can observe that the size of the larger connected component, the number of edges, the degree, and clustering indexes have all increased, suggesting a very dense and connected community with high volumes of propagation.
This is visible in Fig. 9 that also visualises how these links are related to each other, since ‘friendship’ means potentially consuming a user's content while ‘retweeting’ is a clearer index of content already consumed. In particular we are interested in retweets that are made by users that are not already part of the ‘suicidal’ set S (blue indicates nodes ∈ S ). From the Figure we can observe how these retweets (represented as red edges) are primarily located on the outer circle and produced by retweeting components of small size (mostly pairs) that appear in isolation from the rest of the network.
This is further supported by the shortest path metric values in Table 5 not being affected to a significant extent by the addition of the ‘friendship’ links. In fact, although degree and clustering indexes increase because of the addition of them, the shortest paths appear not to shorten (but instead slightly increase). A shorter length may be expected if the majority of retweets were done by users within the suicidal set that are already connected by ‘friendship’ links. Note that this result is in line with other recent studies, such as [28] that reports longer shortest path values for larger Twitter graphs and is in contradiction with what has been observed for other social networks, suggesting that the average path length should instead decrease with the size of the graph [35].
From this figure we can again observe how, beside a dense network of friendship links among ‘suicidal’ users in the inner part of the graph (blue edges), retweeting of suicidal content is performed by users who are not connected and do not belong to S (red edges). This suggests that the propagation of suicidal ideation may not occur among ‘suicidal’ users but instead the dissemination of this specific type of content could be enacted by users who are not directly connected to them.
6. Conclusion
In this paper we have analysed the graph characteristics of a set of 3535 Twitter users who have posted content that human annotators agreed should be classified as containing evidence of suicidal thinking. For the purposes of the research, we refer to these users as ‘suicidal users’.
We conducted a range of social network analysis experiments by analysing the social graphs derived by identifying the followers, friends, mutual friends (where both users follow each other), and retweets of suicidal users. Each node in the social graphs belonged to the given set of ‘suicidal’ users. A number of significant characteristics and properties have been observed by analysing these graphs.
With respect to connectivity, the friends and followers graphs of suicidal users did not present major differences in terms of social network metrics when compared to other literature reporting Twitter snapshots of more generic users (apart from predictable differences from very large networks of millions of users). However, our results showed that while the average user connectivity metrics appear similar to baseline networks, the reciprocity of either follower/following relationships or ‘mutual’ links between suicidal users is significantly higher (up to 73% as opposed to 42% in other studies), suggesting a more tightly-bound community than non-suicidal networks.
From the investigation into communication, our study found that the values of the average shortest path of retweets of suicidal content were higher than in previous studies that reported on general retweet path length. Our results found an average of 5, while other research reported metrics between 2 and 4.8. This finding suggests a greater spread of suicidal ideation content than that reported in the related studies. Another point of interest with this result is that this is similar to the interaction measures reported by a very large Twitter network of over 3 million nodes (avg. shortest path 5.45), thus providing evidence of properties of large scale communication networks within a very small network and suggesting a high level of propagation of such content within the virtual community and some potential for information spread.
The retweets graph was composed of highly disconnected hubs (usually of small size) that propagate suicidal content between small networks via a number of users acting as bridges, demonstrating a potential for information cascade and dissemination outside the set S of authors posting suicidal intent content (with possible contagion effect). The relatively high shortest paths values suggest the existence of these ‘weak-links’/bridges that connect together different smaller communities and, although not particularly numerous, can provide a route to propagation. While content is posted by suicidal users, retweeters are (in the majority of cases) external nodes (i.e. not posting suicidal ideation), thus allowing content dissemination outside our initial group of suicidal users. Once again, this provides evidence of a contagion effect, which has been long recognised in the suicidology field. The findings have implications for suicide prevention and especially the urgent need to develop and evaluate online interventions [36].
7. Future work
While we have identified some interesting and promising results, future research is needed in order to overcome the limitations of our analysis, conducted on an limited size set of annotated posts. In fact, even if we started from a relatively large dataset, the posts classified as containing suicidal intent did not appear to be included in large percentages (only about 10% of tweets harvested using suicide-related keywords) because of the inherent characteristics of this type of users and content. We have developed a machine classification method that is able to automatically distinguish between text containing suicidal ideation and other forms of suicidal communication, and could be used to derive a much larger dataset from social media streams for further validation and experimentation [37].
Furthermore, the analysis could be extended to more than one-hop-away neighbours (friends of friends, retweeters of the retweeters), and then to look at the characteristics of these two-and-more-hops neighbours. For example, by analysing samples of their timeline Tweets, we can investigate if, beside retweeting suicidal content, these users may have posted a similar type of content and could also be classified as ‘suicidal’ users (using the machine classification method in [37]). Further insights could also derived by analysing the demographic characteristics (such as age and gender) of this type of users and their social network of friends, followers, and retweeters.
Finally, it would be also interesting to extend this study by conducting a similar analysis over a longer term, by increasing the duration of the data collection and looking at the regularity and periodicity characteristics of such content. This would allow for the investigation of the evolution of suicidal content over a longer period of time and for further reflections on the social networks of these users, perhaps including comparison with other social movements (see [35] for reference).
Acknowledgement
This research is funded by the Department of Health Policy Research Programme (Understanding the Role of Social Media in the Aftermath of Youth Suicides, Project Number 023/0165), and by the Children Young People’s Research Network as part of the research infrastructure for Wales funded by NISCHR, Welsh Government.
Footnotes
Contributor Information
Gualtiero B. Colombo, Email: g.colombo@cs.cf.ac.uk, ColomboG@cardiff.ac.uk.
Pete Burnap, Email: BurnapP@cardiff.ac.uk.
Andrei Hodorog, Email: HodorogA@cardiff.ac.uk.
Jonathan Scourfield, Email: Scourfield@cardiff.ac.uk.
References
- 1.Pirkis J., Blood R.W. Suicide and the media. Crisis: J. Crisis Interv. Suicide Prev. 2001;22(4):155–162. doi: 10.1027//0227-5910.22.4.155. [DOI] [PubMed] [Google Scholar]
- 2.Gould M., Jamieson P., Romer D. Media contagion and suicide among the young. Am. Behav. Sci. 2003;46(9):1269–1284. [Google Scholar]
- 3.Daine K., Hawton K., Singaravelu V., Stewart A., Simkin S., Montgomery P. The power of the web: a systematic review of studies of the influence of the internet on self-harm and suicide in young people. PloS One. 2013;8(10):e77555. doi: 10.1371/journal.pone.0077555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sloan L., Morgan J., Burnap P., Williams M. Who tweets? deriving the demographic characteristics of age, occupation and social class from twitter user meta-data. PLoS One. 2015;10(3) doi: 10.1371/journal.pone.0115545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Haw C., Hawton K., Niedzwiedz C., Platt S. Suicide clusters: a review of risk factors and mechanisms. Suicide Life-Threat. Behav. 2013;43(1):97–108. doi: 10.1111/j.1943-278X.2012.00130.x. [DOI] [PubMed] [Google Scholar]
- 6.Won H.-H., Myung W., Song G.-Y., Lee W.-H., Kim J.-W., Carroll B.J., Kim D.K. Predicting national suicide numbers with social media data. PloS One. 2013;8(4):e61809. doi: 10.1371/journal.pone.0061809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.J. Jashinsky, S.H. Burton, C.L. Hanson, J. West, C. Giraud-Carrier, M.D. Barnes, T. Argyle, Tracking suicide risk factors through Twitter in the US (2013). [DOI] [PubMed]
- 8.Bearman P.S., Moody J. Suicide and friendships among american adolescents. Am. J. Public Health. 2004;94(1):89–95. doi: 10.2105/ajph.94.1.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kawachi I., Colditz G.A., Ascherio A., Rimm E.B., Giovannucci E., Stampfer M.J., Willett W.C. A prospective study of social networks in relation to total mortality and cardiovascular disease in men in the usa. J. Epidemiol. Community Health. 1996;50(3):245–251. doi: 10.1136/jech.50.3.245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gould M.S. Suicide and the media. Ann. NY Acad. Sci. 2001;932(1):200–224. doi: 10.1111/j.1749-6632.2001.tb05807.x. [DOI] [PubMed] [Google Scholar]
- 11.Magne-Ingvar U., Öjehagen A., Träskman-Bendz L. The social network of people who attempt suicide. Acta Psychiatr. Scand. 1992;86(2):153–158. doi: 10.1111/j.1600-0447.1992.tb03244.x. [DOI] [PubMed] [Google Scholar]
- 12.Spasić I., Burnap P., Greenwood M., Arribas-Ayllon M. A naïve bayes approach to classifying topics in suicide notes. Biomed. Inf. Insights. 2012;5(Suppl 1):87. doi: 10.4137/BII.S8945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Pestian J., Nasrallah H., Matykiewicz P., Bennett A., Leenaars A. Suicide note classification using natural language processing: a content analysis. Biomed. Inf. Insights. 2010;2010(3):19. doi: 10.4137/BII.S4706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Moreno M.A., Jelenchick L.A., Egan K.G., Cox E., Young H., Gannon K.E., Becker T. Feeling bad on facebook: depression disclosures by college students on a social networking site. Depress. Anxiety. 2011;28(6):447–455. doi: 10.1002/da.20805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.De Choudhury M., Counts S., Horvitz E.J., Hoff A. Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM; 2014. Characterizing and predicting postpartum depression from shared facebook data; pp. 626–638. [Google Scholar]
- 16.Balani S., De Choudhury M. Proceedings of CHI’15: 33rd Annual ACM Conference on Human Factors in Computing Systems. 2015. Detecting and characterizing mental health related self-disclosure in social media; p. to appear. [Google Scholar]
- 17.ShawL H., In defense of the internet G.M. the relationship between internet communication and depression, loneliness, self-esteem, and perceived social support. Cyber Psychol. Behav. 2002;5(2):157–171. doi: 10.1089/109493102753770552. [DOI] [PubMed] [Google Scholar]
- 18.Merolli M., Gray K., Martin-Sanchez F. Health outcomes and related effects of using social media in chronic disease management: a literature review and analysis of affordances. J. Biomed. Inf. 2013;46(6):957–969. doi: 10.1016/j.jbi.2013.04.010. [DOI] [PubMed] [Google Scholar]
- 19.Poulin C., Shiner B., Thompson P., Vepstas L., Young-Xu Y., Goertzel B., Watts B., Flashman L., McAllister T. Predicting the risk of suicide by analyzing the text of clinical notes. PloS One. 2014;9(1):e85733. doi: 10.1371/journal.pone.0085733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Abboute A., Boudjeriou Y., Entringer G., Azé J., Bringay S., Poncelet P. Proceedings of the Natural Language Processing and Information Systems. Springer; 2014. Mining twitter for suicide prevention; pp. 250–253. [Google Scholar]
- 21.Hsiung R.C. A suicide in an online mental health support group: reactions of the group members, administrative responses, and recommendations. Cyber Psychol. Behav. 2007;10(4):495–500. doi: 10.1089/cpb.2007.9999. [DOI] [PubMed] [Google Scholar]
- 22.Quercia D., Capra L., Crowcroft J. Proceedings of the ICWSM. 2012. The social world of twitter: topics, geography, and emotions. [Google Scholar]
- 23.Lerman K., Ghosh R. Proceedings of the ICWSM. Vol. 10. 2010. Information contagion: an empirical study of the spread of news on digg and twitter social networks; pp. 90–97. [Google Scholar]
- 24.Thelwall M., Buckley K., Paltoglou G., Cai D., Kappas A. Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 2010;61(12):2544–2558. [Google Scholar]
- 25.Barbosa L., Feng J. Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics; 2010. Robust sentiment detection on twitter from biased and noisy data; pp. 36–44. [Google Scholar]
- 26.Pak A., Paroubek P. Proceedings of the LREC. 2010. Twitter as a corpus for sentiment analysis and opinion mining. [Google Scholar]
- 27.Burnap P., Rana O.F., Avis N., Williams M., Housley W., Edwards A., Morgan J., Sloan L. Detecting tension in online communities with computational twitter analysis. Technol. Forecast. Soc. Change. 2013;95:96–108. [Google Scholar]
- 28.Myers S.A., Sharma A., Gupta P., Lin J. Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee; 2014. Information network or social network? the structure of the twitter follow graph; pp. 493–498. [Google Scholar]
- 29.Ediger D., Jiang K., Riedy J., Bader D.A., Corley C., Farber R., Reynolds W.N. Proceedings of the 39th International Conference on Parallel Processing (ICPP) IEEE; 2010. Massive social network analysis: mining twitter for social good; pp. 583–593. [Google Scholar]
- 30.Kunegis J. Proceedings of the 22nd International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee; 2013. Konect: the koblenz network collection; pp. 1343–1350. [Google Scholar]
- 31.Granovetter M.S. The strength of weak ties. Am. J. Sociol. 1973:1360–1380. [Google Scholar]
- 32.D.R. Bild, Y. Liu, R.P. Dick, Z.M. Mao, D.S. Wallach, Aggregate characterization of user behavior in twitter and analysis of the retweet graph, arXiv preprint arXiv:1402.2671 (2014).
- 33.Webberley W., Allen S., Whitaker R. Proceedings of the Workshop on Mobile and Online Social Networks (MOSN) IEEE; 2011. Retweeting: a study of message-forwarding in twitter; pp. 13–18. [Google Scholar]
- 34.Arnaboldi V., Conti M., Passarella A., Dunbar R. Proceedings of the First ACM Conference on Online Social Networks. ACM; 2013. Dynamics of personal social relationships in online social networks: a study on twitter; pp. 15–26. [Google Scholar]
- 35.Leskovec J., Kleinberg J., Faloutsos C. Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM; 2005. Graphs over time: densification laws, shrinking diameters and possible explanations; pp. 177–187. [Google Scholar]
- 36.Jacob N., Scourfield J., Evans R. Suicide prevention via the internet: a descriptive review. Crisis: J. Crisis Interv. Suicide Prev. 2014;35(4):261. doi: 10.1027/0227-5910/a000254. [DOI] [PubMed] [Google Scholar]
- 37.Burnap P., Colombo G., Scourfield J. Proceedings of the 26th ACM International Conference on Hypertext and Social Media. ACM; 2015. Machine classification and analysis of suicide-related communication on twitter. [DOI] [PMC free article] [PubMed] [Google Scholar]