Abstract
Objective tools to assess suicide risk are needed to determine when someone is at imminent risk. This pilot laboratory investigation utilized a within-subjects design to identify patterns in text messaging (SMS) unique to high-risk periods preceding suicide attempts. Individuals reporting a history of suicide attempt (N=33) retrospectively identified past attempts and periods of lower risk (e.g., suicide ideation). Language analysis software scored 189,478 text messages to capture three psychological constructs: self-focus, sentiment, and social engagement. Mixed-effects models tested whether these constructs differed in general (means) and over time (slopes) two weeks before a suicide attempt, relative to lower-risk periods. Regarding mean differences, no language features uniquely differentiated suicide attempts from other episodes. However, when examining patterns over time, anger increased and positive emotion decreased to a greater extent as one approached a suicide attempt. Results suggest private electronic communication has the potential to provide real-time digital markers of suicide risk.
Keywords: Suicide, Text Mining, Sentiment Analysis, Digital Data, Digital Phenotyping
Suicide is a serious public health problem and a leading cause of death around the world. In fact, more deaths occur by suicide than by all other interpersonal violence, including war and homicide, combined (World Health Organization, 2009). Despite growing awareness of and research into suicide, rates today are very similar to those from the 1950s (Centers for Disease Control and Prevention, 2014), indicating a critical need for better ways to identify and intervene with individuals at risk of suicide. Using a novel within-subjects design, the current pilot study sought to analyze private text messages (Short Message Service; SMS) of non-fatal suicide attempters and identify patterns uniquely indicative of acute suicide risk (i.e., communication patterns immediately preceding suicide attempts vs. during periods of lower risk, such as suicide ideation only or depressed mood). To this end, we aimed to improve our ability to assess suicide risk dynamically in real time.
Need for Better Identification of Acute Suicide Risk
Given the staggering toll suicide takes on society, it is surprising that our methods for identifying when individuals are at highest risk of suicide remain ineffective. One reason for this is that suicide researchers over the past several decades have focused primarily on identifying general risk factors for suicide. As a result, our ability to identify groups of individuals at risk is impressive for a significant yet relatively rare clinical outcome. Using data from the World Health Organization, a recent model, which included known risk factors for suicide attempt, accounted for 80.3% of the variance (Nock, Borges, & Ono, 2012). However, such general risk factors fail to tell us when someone is at imminent risk of suicidal behaviors. In other words, even if we know which individuals may be most vulnerable at some point, we currently lack the tools to assess if or when that individual will actually take action to make a suicide attempt. By comparing multiple periods of high vs. low suicide risk, all within a sample of suicide attempters, the current study attempted to trace how suicide risk changes dynamically to better understand proximal risk for a suicide attempt.
Our chief method for assessing acute or short-term suicide risk remains clinicians’ judgments, which, unfortunately, have been shown not to accurately predict future suicidal behaviors (Nock, Park, et al., 2010). Difficulty in clinically assessing risk stems from the near universal reliance on self-report, which is highly problematic for several reasons, including the fact that those at greatest risk may be motivated to conceal their thoughts (e.g., to avoid or gain release from hospitalization) and that people may lack the ability to accurately assess the factors impacting their current risk. Thus, there is an urgent need for novel, data-driven tools to assess acute suicide risk. In light of these challenges, recent work has sought to use behavioral tools to overcome problems associated with self-report. The current study sought to take a similar behavioral approach via a within-person examination of electronic personal communications. This approach helps avoid the complications inherent to either patient or clinician sel-freport and moves away from traditional between-subjects comparisons, in which many features distinguish people who have versus have not attempted suicide (beyond just suicide attempt status).
Novel Approaches Needed to Study Low Base Rate Behaviors
Despite frequent calls to develop better ways to identify risk of serious suicidal behaviors, researchers studying suicide face tremendous methodological challenges, several of which we addressed in this study. First, suicidal behaviors have low base rates, which makes it difficult to obtain sample sizes large enough to prospectively predict future suicidal behaviors. To illustrate the problem, one recent study estimated 12-month presence of suicide attempt for adults at 0.3% (Borges et al., 2010), which would mean that at least 300 unselected individuals would be required to produce a single suicide attempt during a one-year follow-up period. Further complicating this problem is the fact that the progression to a suicide attempt (e.g., decision making and planning associated with a suicide attempt) usually begins less than a week prior to the attempt (Millner, Lee, & Nock, 2016), suggesting the critical period to examine is very narrow. By reconstructing the timeline of recent suicidal behaviors through a retrospective clinical interview, we utilized a prospective research design to understand which features in text communications predicted suicide attempt, overcoming the power issues of a ‘true’ prospective study design.
Second, the problem in suicide research is that the subjects of greatest interest – suicide completers – cannot be directly studied (Millner, Lee, & Nock, 2015). Therefore, researchers must rely on individuals with non-lethal forms of suicidal thoughts and behaviors. Due again to low base rates, many studies use less severe forms of self-harm, such as suicide ideation, to serve as outcome measures. The shortcoming of this approach is that prior research suggests that risk factors associated with suicide ideation differ from those of more serious suicidal behaviors, such as suicide attempts (May & Klonsky, 2016; Nock, Hwang, Sampson, & Kessler, 2010). By recruiting only individuals with a history of acting on their suicidal thoughts (i.e., actual suicide attempts), or on the cusp of acting on their thoughts (i.e., aborted or interrupted suicide attempts), this study focused on those behaviors most strongly associated with and predictive of suicide completion.
Rise of Digital Text Data
The rising use of smart phones and content-sharing services, such as email, blogs, crowd-source review sites, and social media, has resulted in a proliferation of unstructured textual data, providing a rich source of information that can be analyzed to extract characteristics of the individual (Cambria, Schuller, Xia, & Havasi, 2013; Kagan, Rossini, & Sapounas, 2013). For example, the field of sentiment analysis utilizes machine learning and natural language processing to capture an author’s intended sentiment (e.g., attitude, opinion, or emotional state) from subjective textual data. Such analytic approaches have recently garnered the attention of the mental health community, including suicide researchers. Recent studies have focused on using text analytic approaches on clinical notes to determine long-term predictors for suicide (Hammond & Laundry, 2014; Kagan et al., 2013) and emotions predictive of suicide (Pestian, Matykiewicz, & Linn-Gust, 2012; Sohn et al., 2012; Yang, Willis, De Roeck, & Nuseibeh, 2012). Key findings include that early childhood abuse reported in clinical records predicts suicide attempts and that computer algorithms are more accurate than clinicians in distinguishing genuine from fake suicide notes.
Again, however, such approaches are better suited to tell us who, not when someone, is at risk. The ubiquity of mobile technology, including private text data, offers ripe opportunities for the real-time quantification of individual-level human behavior, known as “digital phenotyping” (Onnela & Rauch, 2016; Torous, Kiang, Lorme, & Onnela, 2016). In this study, we collected participants’ smartphone text messaging data and examined this use of language over time, allowing for insight into how communication patterns changed as an individual drew closer to their suicide attempt (Gunn & Lester, 2012). We used a tool developed by James Pennebaker – Linguistic Inquiry and Word Count (LIWC; Pennebaker, Boyd, Jordan, & Blackburn, 2015) – to analyze various properties of text communications. LIWC searches a text file and calculates normalized word counts for nearly 6,400 words, word stems, and emoticons that have been previously categorized into a number of linguistic and psychological dimensions. Private digital communication, which is one among numerous data streams potentially useful for digital phenotyping, was an ideal source of data for suicide research because it provides ecologically valid data that accumulates automatically and is thus resistant to biases common to the research process, such as demand characteristics or efforts at impression management. This was also the first study, to our knowledge, to collect and analyze private text messaging data for suicide (or any other clinical outcome).
Communication Features of Interest: Self-Focus, Sentiment, and Social Engagement
The main objective of this study was to test whether features of text messaging data could identify and differentiate increasingly severe levels of suicide risk. We tested both whether these characteristics differed in general (mean differences) and over time (slope differences) between periods of high suicide risk (prior to suicide attempts) versus those of lower suicide risk (in decreasing order of risk: suicide ideation, depressed mood, or positive mood). The subset of communication characteristics we focused on were chosen based on theoretical interest and prior research supporting their relevance in suicide-related outcomes1.
First, one theory of suicide posits that suicide is a means to escape from negative self-focus (Baumeister, 1990). It follows that the feedback loop of increasing self-focus and painful recognition of self-failures may lead someone to act on their suicidal thoughts as a means of escape. Indeed, previous research has shown that suicidal individuals tend to be more self-focused in their communication. One study found that poets who completed (vs. did not complete) suicide relied on first-person pronouns to a much greater extent (Stirman & Pennebaker, 2001). Furthermore, in transcribed verbal interviews with suicidal and control adolescent inpatients, use of first-person pronouns was significantly higher for suicidal, compared to control, participants (Venek, Scherer, Morency, Rizzo, & Pestian, 2014). Lastly, first-person pronoun use was associated with a transition from a general mental health forum to a suicide forum among users of a popular online forum (De Choudhury, Kiciman, Dredze, Coppersmith, & Kumar, 2016). In the current study, we tested whether use of first-person pronouns (as an indicator of self-focus) was greater prior to suicide attempts, compared to episodes of suicide ideation, depressed mood, or positive mood.
Second, prior research has identified depressed affect (Bulik, Carpenter, Kupfer, & Frank, 1990), hopelessness (Hawton, Casanas, Haw, & Saunders, 2013; Smith, Alloy, & Abramson, 2006), and anxiety (Nock, Deming, et al., 2012) as important risk factors for suicide, suggesting that individuals at suicide risk may use language expressing these emotions at higher rates and with greater negative valence. In support of this idea, use of positive and negative emotion words in transcribed verbal interviews were significantly different among suicidal adolescent inpatients compared to control participants (Venek et al., 2014). In the current study, we tested whether attempt episodes demonstrated significantly greater use of negative emotion words and less use of positive emotion words (as indicators of negative sentiment). We also tested whether attempt episodes involved greater use of words related to the concept of death.
Third, according to the interpersonal theory of suicide (Joiner, 2005; Van Orden et al., 2010), suicide may result from feelings of perceived burdensomeness and thwarted belongingness. In theory, social support should combat such feelings and increase feelings of connectedness. Previous research examining between-subjects differences in perceived social support indirectly supports this hypothesis. In a study of Twitter users, currently suicidal, compared to non-suicidal, individuals reported significantly less belongingness and higher burdensomeness (Braithwaite, Giraud-Carrier, West, Barnes, & Hanson, 2016). In another study, perceived social support from family was lower for hospital emergency department patients with (vs. without) a past suicide attempt (Thompson, Kaslow, Short, & Wyckoff, 2002). In the current study, we tested whether suicide attempters demonstrated greater signs of disengagement from and burdensomeness on their social support networks prior to suicide attempts compared to other episodes by examining patterns in outgoing vs. incoming messages.
Overview and Hypotheses
In this pilot investigation, we proposed a novel way of utilizing digital data streams, specifically private electronic communication, to identify unique textual patterns that occur in advance of suicide attempts and during periods of heightened suicide risk. Specifically, we asked participants with a history of suicide attempt(s) to retrospectively identify and characterize different periods of their lives: suicide attempts (defined as actual, interrupted, and aborted attempts), suicide ideation, depressive episodes, and periods of positive mood. We then quantitatively compared whether and in what ways their text messages from periods of acute suicide risk (two weeks preceding a suicide attempt as a means to capture the period of escalation towards a suicide attempt) differed from other periods of moderate (suicide ideation) or minimal/no suicide risk (depressed/positive mood). We analyzed and compared text messages during episodes within person based on a number of features selected a priori and tested not only for overall mean differences between episodes but also differences in day-to-day change over time. In this way, we aimed to combine a rich digital dataset and quantitative text analytic methods with laboratory research methodology to address a critical public health problem, in the service of improving our ability to assess and identify suicide risk in real time.
Although no studies to date have examined text messaging content or any other private (i.e., not publicly available) electronic personal communications, we had several hypotheses based on the psychological theories of suicide previously discussed. Using a within-subjects approach, we hypothesized that characteristics of text messaging content during periods of higher suicide risk would differ from those of lesser risk. Specifically, messages prior to a suicide attempt would demonstrate increased self-focus (i.e., greater singular first-person pronoun usage), greater negative emotional content (i.e., greater frequency of negative affect words, in general and specifically related to anxiety, anger, and sadness, and lesser frequency of positive affect words), and decreased social engagement (i.e., lower ratio of sent vs. received text messages).
Numerous aspects of this research were exploratory by nature. Given the lack of prior research examining when one might expect any differences to emerge prior to a suicide attempt, we did not have hypotheses on whether language differences would be observed for episodes overall (means) but not changes over time (slopes), or vice versa. We also did not have hypotheses about the pattern of any observed differences among episode type comparisons, such as whether differences would be unique to suicide attempts (differentiated from all other episode types) or shared between suicide attempt and ideation episodes (differentiated only from depressed and positive episode types).
In this study, we aimed to utilize ecological private SMS data to gain insight into possible novel, real-time digital biomarkers of suicidal behaviors. By better understanding how language differs and changes as suicide risk increases, it may eventually become possible to develop more accurate and objective tools to determine level of suicide risk in real time and get individuals the help they need before they attempt suicide.
Methods
Participants and Recruitment
A sample of 33 participants with at least one reported past suicide attempt were recruited from the University of Virginia Psychology Department’s participant pool and from the Charlottesville community. To reach the target recruitment of the lab study, 2,377 individuals were screened online and 77 individuals were screened by phone. See Figure S1 (in the Supplemental Material available online) for a CONSORT diagram detailing specific numbers and reasons for exclusion.
Materials
Pre-lab study screeners.
Online screening surveys.
Participant pool participants were selected based on two surveys (full screening surveys and other study materials are available on OSF at https://osf.io/9f3v2/). On the participant pool pretest administered at the beginning of the semester (Survey 1), participants were asked, “Have you ever had a period of sadness in the past during which you felt hopeless?” and, if so, they were then asked whether they would like to be contacted about possible participation in studies that ask more questions about this period of time in their life. Those who said yes to both pretest questions were emailed a link to an additional two-question survey (Survey 2), which asked, “Have you ever made a suicide attempt?” and “Have you ever had thoughts of wanting to kill yourself?” Those endorsing a past suicide attempt were emailed and invited to participate in a phone screen to determine if they qualified for the study.
Phone screen.
The purpose of the phone screen was to provide potential participants with more information about the study and to ensure inclusion criteria were met. Inclusion criteria included: (1) confirmation of group status based on report of past suicide attempt; (2) adult status (≥18 years-old); (3) availability of and access to personal messaging data dating back prior to significant life events (e.g., suicide attempt); and (4) minimal or no current desire to die (i.e., less than or equal to 5 on a 0–10 Likert scale and no current suicide plan/intent). Any participants with intense thoughts of suicide who were determined to be at “high risk” or “imminent risk” for suicidal behavior (as determined by a suicide risk assessment instrument) were excluded from study participation and referred for clinical care. Given we were interested in collecting and analyzing text communications made prior to suicidal or other events, participants were excluded if they did not have access to at least one data service type (e.g., text messages, Facebook) dating back to before their most recent suicide attempt.
Communications data collection.
Participants downloaded their communication data in the lab with the experimenter’s assistance, which ensured transparency throughout the process. SMS text message data from iPhone and Android phones were accessed using third-party software or phone applications. Specifically, participants with iPhones were instructed to download their SMS text messages using software programs called iExplorer, SynciOS Manager, and SynciOS Data Recovery, and those with Android devices were instructed to download their messages using Android mobile apps called SMS Backup & Restore and SMS to Text. Though most text data was successfully extracted from both Android and iPhone devices, a number of iPhone users had encryption settings on their phones that prevented third-party iPhone software from accessing text data.
Participants were asked to bring into the lab as many other devices as they thought might contain digital data (e.g., laptop with iTunes backup, older phone) and all available SMS data from each device were downloaded (i.e., not from only certain dates or recipients). Attempts were also made to retrieve data via cloud storage (e.g., iCloud) for messages not available on physical devices (e.g., if a participant had a relatively new phone). Though participants were generally interested in providing as much information as possible, efforts to collection additional messages outside of their current mobile device were minimally successful. Only a small number of participants brought in or had access to old phones. Also, many participants did not know whether they had laptop or cloud backups, and among those who did, a number could not remember their encryption password necessary to access iTunes/iCloud.
Additional forms of personal digital data, including phone call history, Google data (Gmail, Hangouts, and Chrome browser history), Facebook messages, and Twitter messages, were collected for the purposes of future analyses but are not part of the current study. Following the lab session, the raw downloaded text data were transferred to and stored on a secure server intended for the storage of sensitive information. This data storage plan was reviewed and approved by the IRB.
Interview and episode identification.
The goal of the laboratory-based interview was to learn about past suicidal and non-suicidal events in greater detail so that digital communication made during and/or just prior to these events could later be compared using text analytic techniques. During the interview, participants were asked to identify a number of specific events or episodes in the past and the calendar dates during which the episodes took place. Episodes included: 1) past actual, interrupted, or aborted suicide attempts, using the two-week period prior to the attempt as a “suicide attempt episode”; 2) two-week episodes of suicide ideation (not resulting in a suicide attempt); 3) two-week episodes of depressed mood or high stress (not resulting in suicide ideation or attempt); and 4) two-week episode of positive mood (i.e., more positive mood than usual and no ideation or attempt). (Note that reported ‘suicide attempts’ also included incidents in which no physical attempt was enacted but in which participants considered their planning or actions to constitute a higher level of suicidal intent than ‘suicide ideation’ so they subjectively endorsed making an attempt on the screener questions.) The decision to set each episode at two weeks long was made in a conservative effort to capture the critical period of increased ideation, planning, and intent leading up to a suicide attempt (Millner et al., 2016). The question of the optimal time window was also assessed empirically, using visual inspection of descriptive figures to identify when the rate of change appeared to be most pronounced. Specifically, temporal visualizations, using a smoother (loess) function, were constructed for each variable of interest and consisted of graphs plotting daily means for the given variable during the 30 days prior to and following the attempt. Although highly variable across variables, changes tended to occur most notably and often around 7 days prior to the suicide attempt. A total of 3 to 12 episodes were collected for each participant depending on the presence and number of reported events. For each episode type, we asked about a maximum of three episodes (e.g., three suicide attempts if participant has three or more lifetime attempts).
Critically, classifications of these reported episodes became the basis of the study’s within-subjects design, with episode type serving as our main predictor variable and language characteristics of text messages during the episodes as the outcome variables. In terms of suicide risk levels, attempt episodes were considered “high risk,” ideation episodes were considered “moderate risk,” and depressed and positive episodes were considered “minimal/no risk.” Participants answered additional questions about each reported episode as a way to minimize the risk of misclassification, including episode-specific questions about suicidal thoughts and behaviors, depression/anxiety symptoms, and state mood. Additional descriptive characteristics are available as Supplement 1 online.
General questionnaires.
Participants completed a number of general questionnaires at the end of the study, which were used to characterize the sample and were not tied to specific episodes. Specifically, participants provided information about their age, gender, race/ethnicity, citizenship, education, marital status, employment status, and living situation. Participants also reported on current and past treatment experience (e.g., medications, therapy) and psychiatric diagnoses. Some items related to symptom history were adapted from the screener sections of the World Health Organization World Mental Health-Composite International Diagnostic Interview (WHO WMH-CIDI; Kessler et al., 2004; World Health Organization, 2014).
Suicidal thoughts and behaviors.
The Self-Injurious Thoughts and Behaviors Interview (SITBI; Nock, Holmberg, Photos, & Michel, 2007) was used to assess participants’ history of self-injurious thoughts and behaviors. Participants were asked to rate the presence and frequency of each behavior (i.e., non-suicidal self-injury, suicide ideation, suicide plan, suicide attempts, subset of suicide attempts requiring medical attention) within the past month, the past year, the past three years, and lifetime.
Procedure
Participants were invited into the laboratory for one 2- to 2.5-hour session to complete several tasks. First, participants were instructed to download their private data sources (e.g., SMS). Second, participants were interviewed by the experimenter and asked to identify a number of episodes, including dates of past suicide attempts (and interrupted or aborted attempts) and two-week episodes of suicide ideation, depressive mood, and positive mood. Participants were then asked to describe specific details of and context surrounding each time period. Last, participants completed the aforementioned questionnaires.
Risk Assessment
Prior research indicates that asking young adults with previous suicide attempts about suicide does not cause an increase in psychological distress or increased suicidal thoughts or behaviors, either immediately following an assessment (Gould et al., 2005) or several years after an assessment (Reynolds, Lindenboim, Comtois, Murray, & Linehan, 2006). However, as a precautionary measure, participants were asked two questions regarding negative mood and desire to die both at the beginning and at the conclusion of the lab session to assess any changes as a consequence of the interview and study visit. Those who significantly increased in negative affect or suicidality (i.e., any increase of 2 points or greater on the 0 to 10 negative mood and desire to die rating scales) and/or were elevated in current suicidality (i.e., score greater than 3 on desire to hurt self question) were administered a formal suicide risk assessment and assigned a risk level based on their answers. The protocol was that those considered at “moderate risk” would be assisted in developing a “safety plan,” or a series of steps to take to keep one safe when feeling suicidal; those considered at “high risk” or “imminent risk” of suicide would be asked to provide a contact number and immediately contacted by the laboratory director, a licensed clinical psychologist (though no participants ended up being at high or imminent risk).
Plan for Analyses
Data preparation and scoring.
Given the format of SMS data differed between iPhone and Android phones, participants’ SMS data files were individually cleaned using Python to standardize the encoding of messages and naming of variables across all participants. SMS data were then merged with the participant and episode information collected during the lab study. Each individual SMS message was inputted into and scored using the 2015 version of Linguistic Inquiry and Word Count (LIWC), a language analysis software package that calculates numeric values based on the properties of the text (Pennebaker et al., 2015). The majority of LIWC variables calculate scores based on the percentage of words belonging to a given category (e.g., score of 60.0 indicates 60% of words in the message belonged to the given category). These LIWC percentage scores were converted into counts and then entered as proportions in our models (i.e., number of words in a given text message belonging to versus not belonging to the LIWC category); this approach was taken to more precisely capture the constructs of interest and appropriately weight LIWC values based on the frequency of words in a given message. Other LIWC variables include counts of words (e.g., word count of message, words per sentence) and several proprietary, “non-transparent” variables (e.g., Tone); see Table S3 for the specific LIWC variables calculated for the three psychological constructs. After this information was appended, the individual SMS files were then compiled and identifying information (e.g., message content, sender/recipient names) were removed prior to analysis.
Preliminary analyses.
Preliminary descriptive analyses on demographic information, mental health and suicide history, and other information pertinent to the primary analyses (e.g., iatrogenic effects of the lab study) were performed on the sample of participants contributing at least one episode of messaging data.
Primary analyses.
We performed inferential analyses using mixed-effects models to examine within-subject (between-episode) differences among suicide attempters in several communication features. In these analyses, we focused on testing for both mean and slope differences between suicide attempt and other episode types (suicide ideation, depressed mood, positive mood) for the three previously discussed psychological constructs given our interest in understanding if and how communication patterns differ across episodes and whether there are language patterns unique to being in an imminent suicidal state.
Advantages of mixed-effects models.
A mixed-effects method was selected because of its well-established advantages in terms of producing more accurate effect estimates and its ability to handle missing data, non-normal outcome data, and unbalanced classes (Baayen, Davidson, & Bates, 2008; Dixon, 2008; Jaeger, 2008). Using mixed models allowed us to account for variability among participants, episodes, and messages, leading to more accurate and generalizable population estimates for within-subject effects of episode type and resolving non-independence of the nested data. This approach also allowed us to maximize sources of variance by analyzing on the message level rather than only mean values by episode, and was especially appropriate for this dataset given the amount of ‘missing data’ (i.e., participants varied widely in terms of the number of episodes types they reported and for which they had text data).
Selection of random effects and specification of models.
In line with recommendations by Barr and colleagues (2013), a maximal random effects model, which included random intercepts of participant, episode, and message, and random by-participant slopes, was used to boost generalizability of the findings and protect against inflated Type I error rates. Models were fitted using the “lme4” package in R (Bates, Maechler, Bolker, & Walker, 2014; R Core Team, 2013). Generalized linear mixed models (GLMMs) used the “glmer” function with a logit link function appropriate for binomially distributed data, which transformed parameters into log-odds units. The estimated regression coefficients produced were on a log scale. For this set of analyses, a significant result says that the odds of a category-specific word appearing in text messages differed as a function of episode type. For outcome variables that were not proportion scores (e.g., number of messages sent per day), the raw continuous variable was used in a linear mixed-effects model (LMM), using the “lmer” function. (See Supplement 2 online for additional information about mixed-effects models.)
Plan for Primary Analysis 1: Does language differ between episode types?
A series of mixed-effects models were performed with episode type as a within-subject fixed effect (4 levels: attempt, ideation, depressed, positive) and the language feature of interest as the outcome variable. A random by-participant slope for episode type and random intercepts of participant, participant-episode, and message were included as the random effects. Likelihood-ratio (Wald chi-square) tests were performed to compare goodness of fit for models including and excluding the fixed effect of episode type. A significant test is conceptually similar to an omnibus test for a predictor in an ANOVA and therefore indicates whether the inclusion of the fixed effect significantly improves model fit. Any significant tests were followed up with pairwise comparisons between each of the episode types (using z and t statistics for binary/proportion and continuous outcome variables, respectively). Mixed-effects models do not yield straightforward effect size statistics like other regression models (e.g., R2) and there is not a consensus on the most appropriate approach to take (see Peugh, 2010). Here, β (standardized values of the model parameter estimates in log odds units) are reported and serve as effect sizes.
Plan for Primary Analysis 2: Does language approaching a suicide attempt change differently over time relative to language changes during other episode types?
Like the first set of analyses, the second set focused on testing for differences between suicide attempt episodes and other episode types, but did so by examining differences in changes in communication over time (rather than overall mean differences) between episode types. The purpose of this second set of analyses was to examine whether communication changed differently during the 14 days leading up to a suicide attempt compared to changes during other two-week periods for episode types of lower suicide risk (for which there is no theoretical expectation of temporal change). A series of mixed-effects models were performed for each language feature with 3 fixed effects: episode type (4 levels: attempt, ideation, depressed, positive), day of episode (numerical factor ranging from −14 to 0), and the interaction of episode type and day. The maximal random effects model appropriate for these data included random intercepts of participant, participant-episode, and message, and random by-participant slopes for episode type, day of episode, and episode type by day. Models with the full set of random slopes did not converge; therefore, only a by-participant slope of episode type was included in the final model. Although episode type and day were included as fixed effects in the models, we were only interested in the interaction term given we evaluated episode type separately already and did not have a theoretical interest in time as an independent variable. The same procedures were used to evaluate goodness-of-fit of the fixed effects and pairwise contrasts for any significant interactive effects between attempt by time and other episode type by time interactions.
Condition comparisons using alternate data subsets.
Additional analyses were performed to examine the same set of outcome variables but in different ways to examine the robustness of the primary results and to see if the pattern of results changed. Those analyses are available in Supplement 3 online.
Sample size and power considerations for mixed-effects designs.
Given the large effect sizes reported in a number of studies examining some of the same LIWC variables, such as self-focus (ds=1.06–1.31; Stirman & Pennebaker, 2001; Venek et al., 2014), sentiment (ds=0.88–1.21; Venek et al., 2014), and constructs related to social engagement, such as belongingness (d=1.52; Braithwaite et al., 2016), we conducted a power analysis based on the assumption of large effect sizes (though no prior studies have examined within-subjects differences, which may or may not vary substantially from between-subjects comparisons). We determined that a sample size of 30 suicide attempters (representing 60 attempts but only 20 with collected, usable SMS data, divided by the design effect) would provide enough power to detect only large effect sizes (Cramer’s V=0.29) for chi-square tests of mixed-effects models comparing suicide attempts to other types of episodes, assuming 80% power and a significance level of .05. Therefore, it should be noted that the study was underpowered to detect small- or medium-sized effects and therefore prone to Type II error. Also, even among those variables for which there is theoretical reason to expect large effects, most of this research relied on between-subjects, rather than within-subjects, designs. Thus, it was unknown whether previously observed effect sizes would hold when comparing within-subject episodes as occurred in this pilot study.
Results
Preliminary Analyses
Sample characteristics2.
As shown in Table 1, 33 participants reported having made a past enacted, interrupted, or aborted suicide attempt. Most of the participants were female, White, and college-aged. As expected based on recruitment criteria, all participants reported a history of at least one actual/enacted, interrupted, or aborted suicide attempt (about 80% reported making an actual suicide attempt) and about half reported a history of non-suicidal self-injury. The majority of participants reported having struggled with a mental health problem during their lifetime, and a little over half reported having a diagnosis, most commonly a mood and/or anxiety disorder.
Table 1.
Participant characteristics
Variable | N=33 |
---|---|
| |
Mean (SD) age in years | 20.4 (2.4) |
Sex (% female) | 84.8 |
Citizenship (%) | |
U.S. | 97.0 |
Non-U.S. | 3.0 |
Ethnicity (%) | |
Non-Hispanic | 87.9 |
Hispanic | 12.1 |
Race (%) | |
Caucasian | 60.6 |
Asian | 18.2 |
African American | 9.1 |
Multiracial | 6.1 |
Other | 3.0 |
Education (%) | |
Graduate degree | 3.0 |
Bachelor’s degree | 3.0 |
Some college | 94.0 |
Lifetime self-harm presence (%) | |
Nonsuicidal self-injury (NSSI) | 48.4 |
Suicide ideation | 90.9 |
Actual (enacted) suicide attempt | 81.8 |
Interrupted/aborted suicide attempt | 78.8 |
Lifetime self-harm – mean # (SD) | |
Actual (enacted) suicide attempt | 1.43 (0.81) |
Interrupted suicide attempt | 1.62 (0.87) |
Aborted suicide attempt | 1.95 (2.06) |
Mental health problem lifetime presence (%) | |
Yes | 84.8 |
No | 9.1 |
Prefer not to answer | 6.1 |
Diagnoses lifetime presence (%) | |
Mood disorder | 48.5 |
Anxiety disorder | 39.4 |
Substance use disorder | 3.0 |
Any disorder | 57.6 |
Iatrogenic – change pre to post | |
Mood (0–10) | −0.20 |
Desire to die (0–10) | 0.20 |
Episode characteristics.
A total of 293 episodes were collected across all participants. Slightly more episodes were reported for non-suicidal (i.e., depressed/positive mood) compared to suicidal (attempt/ideation) periods. Among the episodes queried during the lab interview, 134 (46%) episodes contained SMS data, collected from 27 different participants. Among these 27 participants, 15 had data from at least one reported suicide attempt; the other 12 participants contributed data from non-suicide attempt episodes and were still included in the analyses. Participants reported and had SMS data for on average approximately 1.5 episodes for each episode type. Across all episodes, 189,478 text messages were collected and analyzed, ranging between on average about 1,200–1,600 text messages per episode. Participants were generally confident about the accuracy of the dates they selected, rating 90.3% of episodes as ‘very certain’ (i.e., exact days) or ‘somewhat certain’ (i.e., may be off by a few days). Consistent with expectations, severity of suicide risk level of episode type was associated with greater suicide ideation and negative mood symptoms. (See Supplement 1 online for additional descriptive information and analyses of episodes)
Characteristics of suicide attempts.
Additional descriptive information about reported actual, interrupted, or aborted suicide attempts was collected to better understand the methods used and circumstances leading to the attempts, as well as to validate that the level of suicidal intent and lethality associated with the attempts was high (see Table S4 for additional information). Among attempters with SMS data, 13 participants reported a single lifetime attempt and 14 reported multiple (M=1.74, SD=0.86). Among the 21 attempts with corresponding SMS data, the most common methods reportedly used or considered/aborted were medications/overdose (43%), hanging/suffocation (38%), and jumping from a height (24%). Based on participants’ description of the objective lethality of the attempt, 29% of incidents involved some action being taken in which some physical harm was caused (e.g., taking a higher than normal dose of medication, resulting in nausea and light-headedness) while 71% of incidents did not result in any physical harm (e.g., driving to high place, such as a bridge, but deciding not to jump). Notwithstanding the low rates of actual physical harm, participants reported fairly high intent to kill themselves (M=3.95, SD=0.86) and subjective judgment of the lethality of the suicide method was reasonably high (M=3.38, SD=0.86), confirming the serious nature of the attempt episodes. Further, a specific suicide plan (i.e., time and place) was present for most of the attempts (76%).
Iatrogenic effects.
Examining possible iatrogenic effects of the study, reported mood from pre (M=6.33, SD=1.38) to post (M=6.12, SD=1.17) did not significantly change, t(32)=1.05, p=.304, and desire to die from pre (M=0.82, SD=1.10) to post (M=0.61, SD=0.90) significantly decreased slightly, t(32)=2.23, p=.033.
Results of Primary Analysis 1: Does Language Differ Between Episode Types?
When testing whether language use was associated with episode type, analyses revealed no significant fixed effect of episode type for variables reflecting self-focus and social engagement. (See Table 2 for full results.)
Table 2.
Analysis of deviance table for fixed effect of episode type (follow-up pairwise comparisons in gray) on self-focus, sentiment, and social engagement variables
Self-Focus | Test | df | p | β | SE | Test | p |
---|---|---|---|---|---|---|---|
| |||||||
1st-person pronouns – singular (i) | χ2=1.62 | 3 | .656 | ||||
1st-person pronouns – plural (we) | χ2=0.28 | 3 | .964 | ||||
| |||||||
Sentiment | Test | df | p | β | SE | Test | p |
| |||||||
Positive emotion (posemo) | χ2=2.18 | 3 | .536 | ||||
Negative emotion (negemo) | χ2=2.15 | 3 | .543 | ||||
Anxiety (anx) | χ2=3.62 | 3 | .306 | ||||
Anger (anger) | χ2=1.45 | 3 | .694 | ||||
Sadness (sad) | χ2=8.74 | 3 | .033 | ||||
| |||||||
Intercept | −5.135 | 0.072 | z=−71.81 | <.001 | |||
Ideation vs. Attempt | −0.247 | 0.094 | z=−2.64 | .008 | |||
Depressed vs. Attempt | −0.113 | 0.104 | z=−1.09 | .277 | |||
Positive vs. Attempt | −0.097 | 0.091 | z=−1.07 | .285 | |||
Depressed vs. Ideation | 0.134 | 0.078 | z=1.71 | .088 | |||
Positive vs. Ideation | 0.149 | 0.069 | z=2.18 | .029 | |||
Positive vs. Depressed | 0.016 | 0.073 | z=0.22 | .827 | |||
| |||||||
Death (death) | χ2=1.79 | 3 | .671 | ||||
Emotional tone (Tone) | χ2=1.36 | 3 | .716 | ||||
| |||||||
Social Engagement | Test | df | p | β | SE | Test | p |
| |||||||
Ratio sent vs. received words | χ2=2.41 | 3 | .493 | ||||
Ratio sent vs. received messages | χ2=7.76 | 3 | .051 | ||||
# outgoing messages | χ2=6.75 | 3 | .080 | ||||
# incoming messages | χ2=4.89 | 3 | .180 |
Note. Pairwise comparisons performed only for significant fixed effects
Regarding sentiment, there was a significant effect of episode type on use of sad words, χ2(3)=8.74, p=.033. Pairwise comparisons revealed that attempt episodes were significantly higher in sad χ words compared to ideation (z=2.64, p=.008), but not compared to depressed or positive mood (zs=1.07–1.09, ps=.277-.285). No significant effects by episode type emerged for more general emotional content (i.e., positive emotion, negative emotion, emotional tone) or other specific emotions/constructs (i.e., anxiety, anger, death).
Taken together, these results suggest that when examining overall mean differences between episode types, there are few differences, though suicide attempts appear to be associated with greater use of language indicating sadness. However, language indicating sadness was not unique to suicide attempts (i.e., attempts were higher in sad words compared to ideation but not depressed or positive mood).
Results of Primary Analysis 2: Does Language Approaching a Suicide Attempt Change Differently Over Time Relative to Language Changes During Other Episode Types?
When testing whether language use changed over the course of the two weeks leading up to a suicide attempt differently relative to change during the two weeks identified for other episode types, no significant interactions emerged for any of the variables related to social engagement (i.e., daily word/message counts for, and ratio between, outgoing and incoming messages). However, analyses revealed several significant episode type by time interactive effects for the other two constructs of interest. (See Table 3 for full results.)
Table 3.
Analysis of deviance for interactive effects of episode type and time (follow-up pairwise comparisons in gray) for self-focus, sentiment, and social engagement variables.
Self-Focus | Test | df | p | β | SE | Test | p |
---|---|---|---|---|---|---|---|
| |||||||
1st-person pronouns – singular (i) | χ2=11.57 | 3 | .009 | ||||
| |||||||
Episode type (Attempt vs. Ideation) × Time | −0.015 | 0.005 | z=−2.99 | .003 | |||
Episode type (Attempt vs. Depressed) × Time | −0.010 | 0.005 | z=−2.10 | .035 | |||
Episode type (Attempt vs. Positive) × Time | −0.004 | 0.005 | z=−0.89 | .374 | |||
| |||||||
1st-person pronouns – plural (we) | χ2 =1.37 | 3 | .712 | ||||
| |||||||
Sentiment | Test | df | p | β | SE | Test | p |
| |||||||
Positive emotion (posemo) | χ2 =41.67 | 3 | <.001 | ||||
| |||||||
Episode type (Attempt vs. Ideation) × Time | 0.010 | 0.005 | z=1.96 | .049 | |||
Episode type (Attempt vs. Depressed) × Time | 0.027 | 0.005 | z=5.92 | <.001 | |||
Episode type (Attempt vs. Positive) × Time | 0.016 | 0.005 | z=3.50 | <.001 | |||
| |||||||
Negative emotion (negemo) | χ2=4.93 | 3 | .177 | ||||
Anxiety (anx) | χ2=2.31 | 3 | .511 | ||||
Anger (anger) | χ2=7.83 | 3 | .049 | ||||
| |||||||
Episode type (Attempt vs. Ideation) × Time | −0.023 | 0.012 | z=−2.00 | .046 | |||
Episode type (Attempt vs. Depressed) × Time | −0.026 | 0.011 | z=−2.49 | .013 | |||
Episode type (Attempt vs. Positive) × Time | −0.027 | 0.011 | z=−2.54 | .011 | |||
| |||||||
Sadness (sad) | χ2=1.33 | 3 | .721 | ||||
Death (death) | χ2 =9.47 | 3 | .024 | ||||
| |||||||
Episode type (Attempt vs. Ideation) × Time | −0.049 | 0.027 | z=−1.83 | .067 | |||
Episode type (Attempt vs. Depressed) × Time | −0.008 | 0.025 | z=−0.33 | .739 | |||
Episode type (Attempt vs. Positive) × Time | −0.021 | 0.025 | z=−0.82 | .412 | |||
| |||||||
Emotional Tone (Tone) | χ2 =26.81 | 3 | <.001 | ||||
| |||||||
Episode type (Attempt vs. Ideation) × Time | 0.149 | 0.101 | t=1.48 | .140 | |||
Episode type (Attempt vs. Depressed) × Time | 0.401 | 0.094 | t=4.27 | <.001 | |||
Episode type (Attempt vs. Positive) × Time | 0.083 | 0.096 | t=0.86 | .390 | |||
| |||||||
Social Engagement | Test | df | p | β | SE | Test | p |
| |||||||
Ratio sent vs. received # words | χ2=4.99 | 3 | .172 | ||||
Ratio sent vs. received # messages | χ2=1.70 | 3 | .636 | ||||
# outgoing messages | χ2=4.22 | 3 | .238 | ||||
# incoming messages | χ2=1.58 | 3 | .664 |
Note. Pairwise comparisons performed only for significant interactions
Regarding self-focus, there was a significant interaction for singular first-person pronoun use, χ2(3)=11.57, p=.009, and pairwise comparisons revealed that the change over time approaching a suicide attempt significantly differed from change over time for ideation (z=2.99, p=.003) and depressed mood (z=2.10, p=.035), but not for positive mood (z=0.89, p=.374). As shown in Figure 1A, self-focus tended to increase preceding an attempt, whereas depression and positive episodes were flatter and ideation appeared to show change over time in a downward direction.
Figure 1.
Differences in episode type × day of episode interaction for (A) first-person pronoun use, (B) positive emotion, (C) anger, (D) death words, and (E) emotional tone.
Regarding sentiment, the use of words indicating positive emotion, anger, death, and emotional tone changed over time differently as a function of episode type. There was a significant interaction for positive emotion, χ2(3)=41.67, p<.001, such that positive emotion decreased more steeply during attempt episodes compared to all other episode types, including ideation (z=1.96, p=.049), depressed (z=5.92, p<.001), and positive (z=3.50, p<.001) episodes (Figure 1B). There was also a significant interaction for anger words, such that anger increased more steeply during attempt compared to all other episode types, including ideation (z=2.00, p=.046), depressed (z=2.49, p=.013), and positive (z=2.54, p=.0111) episodes (Figure 1C). There was also a significant interaction for death words, χ2(3)=9.47, p=.024. However, there were no significant differences in change over time for attempt compared to all other episodes (zs=0.331.83, ps=.067-.739) (Figure 1D); ideation episodes appeared to show a steeper decrease in death words compared to both attempt and positive mood episodes. Lastly, there was a significant episode by time effect for emotional tone, χ2(3)=26.81, p<.001, such that attempt episodes decreased in the level of positive, upbeat language over time compared to depressed episodes (t=4.27, p<.001), but not compared to ideation (t=1.48, p=.140) or positive (t=0.86, p=.390) episodes (Figure 1E).
Taken together, the results suggest that communication may change in different ways during the time leading up to a suicide attempt compared to other times of lesser suicide risk. Unique to attempts, positive emotion decreased and anger increased to a greater extent as one approached a suicide attempt, relative to the other episode types. In addition, self-focus appeared to change over time differently for attempts compared to ideation and depressed, but not positive, episodes, suggesting selffocus did not uniquely distinguish attempts from other episodes.
Discussion
In this pilot study, we examined private electronic communication from past suicide attempters as a potential source of real-time digital biomarkers of heightened suicide risk. We employed a within-subjects design to evaluate how language use in text messages differed and changed over time just before a suicide attempt (high risk), relative to other periods when participants had suicidal thoughts but did not attempt (moderate risk), or were depressed but not suicidal, or during periods of positive mood (low/minimal risk). We used an automated language analysis software (LIWC) to produce scores on a set of variables intended to capture three psychological constructs of interest – self-focus, sentiment, and social engagement – and then tested both for overall mean differences in language use and for differences in changes over time during the two weeks prior to a suicide attempt relative to during other episode types.
In terms of overall mean differences, few reliable differences emerged though results indicated that the period of high risk just before a suicide attempt was associated with messages indicating greater anxiety and sadness. However, none of these differences in language use were uniquely associated with suicide attempt episodes, and are therefore not specific characteristics of a high suicide risk state. Although language use was different between attempt and ideation episodes on a number of language features, such as sadness, these differences did not hold when comparing attempts to other non-suicidal episodes. Therefore, these mean differences analyses were unable to identify language features in text messages that could reliably identify high or moderate suicide risk states.
However, when examining differences in patterns over time, results suggested that communication changed in different ways during the time leading up to a suicide attempt compared to other periods of lesser risk. Unique to attempts, anger increased and positive emotion decreased to a greater extent during the two weeks prior to suicide attempts, relative to the other episode types. Language indicating self-focus tended to increase over time during attempt episodes, though the trajectories for these variables could not reliably differentiate high suicide risk from other risk states. Overall, these results indicate that a small set of specific private text communication habits, particularly tied to use of emotional language, potentially provide clues into the suicidal mind and may serve as temporally sensitive markers of suicide risk.
Digital Communication Patterns as Novel Markers of Risk
Self-focus.
Operationalized by singular first-person pronoun use (“I” words), we hypothesized that self-focus would be greater during suicide attempt episodes relative to other episodes. This hypothesis was based on a theory construing suicide as a means to escape from negative self-focus (Baumeister, 1990) and the vicious feedback loop created from increasing self-focus and recognition of self-failures, as well as previous research demonstrating more self-focused communication among suicidal individuals (Stirman & Pennebaker, 2001; Venek et al., 2014). Results indicate that self-focus was not especially pronounced for attempt episodes when considering the entire two-week episode but self-focus appeared to increase during those two weeks. However, this increase was only steeper relative to ideation and depressed mood, not relative to positive mood.
This result ran contrary to our hypothesis and the general premise that degree of self-focus would be expected to map onto risk levels (i.e., no, low, moderate, high) in a linear manner. However, prior research suggests a possible reason why a language variable such as self-focus may behave similarly for attempt and positive episodes despite them being on opposite ends of the risk scale. Agitation and anxiety are better predictors of suicide attempt than a clinical diagnosis of depression (Busch, Fawcett, & Jacobs, 2003; Nock, Hwang, et al., 2010), which is in line with the thinking that a suicide attempt, in contrast to suicide ideation or depression, is a behavior that requires energy and activation to enact. Accordingly, it is plausible that there are similarities between psychological states during attempt and during positive episodes, which may be reflected by similarities in language use. This result potentially suggests that self-focus may be especially sensitive to changes in risk related to increased energy or activation, and the “signal” may only be detectable by examining subtle temporal changes. Even so, this interpretation is based on speculation, and change in self-focus over time did not differentiate high risk from other lower risk states, reflecting limits to its current utility as a means to identify individuals at risk of suicide attempt.
Sentiment.
The current study hypothesized that text message communication prior to suicide attempts (relative to other time periods) would exhibit more negative sentiment (higher negative emotion and lower positive emotion) given prior research identifying a number of affective and emotional factors associated with suicide risk. One question we had was whether language reflecting negative emotion (or lack of positive emotion), which is a common feature of many psychological disorders (Brown, Chorpita, & Barlow, 1998), could be used to make fine-grained distinctions between levels of suicide risk. Interestingly, although suicide attempts were generally associated with higher levels of sadness and anxiety (but with significant overlap with other episodes), suicide attempts were uniquely associated with changes over time in both positive emotion and anger. Specifically, decreases over time in positive emotion and increases over time in anger were markedly steeper leading up to a suicide attempt compared to other episodes.
A difference between episode types in anger is not entirely surprising given prior research has found an association between trait anger and suicide attempts (Ammerman, Kleiman, Uyeji, Knorr, & McCloskey, 2015; Daniel, Goldston, Erkanli, Franklin, & Mayfield, 2009; Hawkins & Cougle, 2013) and, as previously discussed, the fact that suicide attempts may require an increase in activation to enact. However, the fact that such a difference emerges only when looking at language use over time (i.e., leading up to a suicide attempt) underscores the potential importance and utility of examining risk factors for suicide attempts dynamically. Similarly, it was somewhat unexpected that differences between suicide attempt and other episodes for lower positive emotion emerged only when examining change over time, given one might expect a more persistent lack of positive emotion in language during the two weeks prior to a suicide attempt. These findings raise the intriguing possibility that psychological constructs like sentiment, which do not seem especially specific to suicide, may serve as unique indicators of high suicide risk when examined over time in high-risk populations.
Interestingly, use of death-related words did not significantly differ between episodes, suggesting the need to identify hidden and more subtle signs of risk beyond explicit endorsements of suicide-specific language. Given this was a small pilot study, more research is necessary but these findings underscore the utility of identifying risk markers using real-time data and raise the possibility that specific, temporally sensitive markers of suicide risk may be found in seemingly general, trait-like psychological constructs, such as sentiment and emotion.
Social engagement.
The interpersonal theory of suicide (Joiner, 2005; Van Orden et al., 2010) proposes that the motivation behind suicide is driven by feelings of thwarted belongingness and perceived burdensomeness. In theory, social support should combat such feelings (of thwarted belongingness and perceived burdensomeness) and increase feelings of connectedness. We hypothesized that suicide attempt episodes would demonstrate greater signs of social disengagement. Results did not support this hypothesis. Differences in counts of and ratios between sent and received text messages between attempts and other episodes did not emerge, whether looking at episodes overall or over time.
Although these data do not provide evidence that communication habits, separate from language content, may be useful indicators or suicide risk, it is possible that these particular methods for capturing social engagement were too basic to detect any meaningful signal. For example, we only examined aggregate information about incoming and outgoing messages and were not able to examine more fine-grained details about these interactions, such as who was initiating conversations, whether certain texts to participants were going unanswered, or whether the content of texts may have indicated signs of social distress or rejection. Future studies on these or other data could examine more intricate interpersonal dynamics to better understand whether other social factors may help identify signs of heightened suicide risk.
Future Directions in Textual Analyses to Enhance Suicide Prediction
There are a number of exciting questions to pursue in the future. In this study, we limited our linguistic analyses to a handful of categories based on the presence of single words in a custom dictionary (LIWC). One weakness of this approach is that examining single words in isolation can fail to capture the semantic context of the word (e.g., “not happy” would count as “positive emotion” because negation is not accounted for) and has not been modified based on language categories tailored towards constructs of interest for suicide specifically. Future studies could examine 2- or 3-word phrases (called n-grams) or word embeddings, popular natural language processing methods to capture more semantic meaning. Even more, it may be possible to take a qualitative coding approach whereby researchers could develop a codebook of themes (e.g., relationship distress, hopelessness) and raters could then blindly code episodes to see whether episodes differ thematically. Furthermore, data-driven methods for understanding attempt episodes include unsupervised learning techniques, such as topic modeling, to identify themes in communication prior to suicide attempts that might provide richer descriptions of the themes and inform future predictors of interest. (See Kern et al., 2016 for a useful review of methods for analyzing social media language.) A future study could also examine how language and communication habits change following a suicide attempt, providing potentially valuable insight into the psychological effects of suicidal behaviors. It should be acknowledged, however, that data collection of SMS and other private digital communication is likely to remain an ongoing challenge for researchers, particularly as younger populations transition to closed ecosystems and proprietary platforms (e.g., Snapchat) for their interpersonal communication.
The results of the current study could also serve as the basis for building a machine learning model to automate identification of text features associated with suicide risk. Machine learning refers to a set of algorithms designed to predict membership to a class based on a set of features (James, Witten, Hastie, & Tibshirani, 2013). Machine learning techniques may be particularly useful for predicting low base rate behaviors like suicide attempts. In a study of Army soldiers, Kessler and colleagues (2015) created a machine learning classifier based on known risk factors and found that 5% of individuals assigned by the classifier to the highest risk category comprised over 50% of suicide deaths at follow-up. In another study, Walsh, Ribeiro, and Franklin (2017) used electronic health records to develop machine learning algorithms that predicted future suicide attempts among adult patients. The strong accuracy achieved in these studies demonstrates the ability to use an ensemble of predictors, which on their own would carry trivial predictive value, to predict a complex multifactorial clinical outcome. The current results could guide development of a machine learning model to predict and classify episode types by identifying text features associated with suicide risk, which our research group has already begun to explore. Having such a rich dataset (text messages) offers the opportunity to see whether a bottom-up, data-driven approach can detect signals that are statistically related to suicide, but which are not known theoretically and that we as humans would otherwise not detect. In a recently published study, our group used a deep neural net machine learning classifier to model within-subject episode type differences between attempt/ideation episodes and depressed mood episodes using an atheoretical set of communication variables (n-grams), including ones not analyzed in this current study (Nobles et al., 2018). Sensitivity and specificity were moderate to strong, indicating that the algorithm performed fairly well at classifying episodes. These findings suggest the promise of detecting “hidden” but meaningful signals using predictive models, even when only a small number of classification units are available. Using this machine learning framework, future studies could track people in real time (vs. retrospectively) to determine sensitivity and specificity of suicide risk predictions as well as fuse SMS with other smartphone-based digital data streams (e.g., voice samples via microphone, spatial trajectories via GPS) to provide more robust digital phenotyping (Torous et al., 2016).
Clinical Implications
Despite decades of research, judgments of imminent suicide risk remain low in accuracy, in part due to a reliance on at-risk individuals’ subjective self-report, which is prone to efforts to conceal and/or an inability to accurately assess one’s current state. Knowledge gained from this study could put us one step closer to the development of an objective monitoring tool capable of tracking individuals’ communication “behind the scenes,” notifying suicidal individuals and/or their clinician or family if their patterns of communication indicate increasing levels of suicide risk. To further increase precision, it may even be possible to someday develop a machine learning algorithm to ‘learn’ how a given individual differs in general from a normative sample to increase and individualize predictive accuracy. This kind of approach that is temporally sensitive and takes into account individual differences could have profound implications for predicting when a person, not just who, is at risk of suicide attempt.
Although the possible future clinical applications of this work could help address a major public health burden, the development of a predictive tool would raise a number of important ethical challenges. Such considerations include determining how the consent or permission process for users would work, who would get notified if text messages included elevated risk, who gets access to model predictions (e.g., insurance companies), how data are deidentified and stored, and what intervention would be undertaken. Similar to diagnostic medical tests that produce certain levels of false positives and false negatives, decisions would need to be made regarding the most appropriate threshold for what would be considered “elevated risk” deserving of intervention. For example, is it preferable to flag more individuals but with less certainty of risk (producing more false positives) or fewer individuals but with greater certainty of risk (producing more false negatives)? The field would also need to grapple with questions related to mandated reporting and involuntary hospitalization. For example, what is the most appropriate action for someone who denies having suicidal thoughts, plans, or intent but whose text messages indicate elevated risk? Would such a situation warrant hospitalization? These and many more ethical questions will need to be addressed if a predictive monitoring tool for suicide risk is to be effectively implemented.
Limitations and Conclusion
There are several methodological limitations to acknowledge. First, dates and information regarding episodes relied on retrospective self-report, which may not have been entirely reliable, especially for less recent episodes. A prospective design in which participants were assessed frequently for the presence of suicidal thoughts or behaviors would resolve some concerns about self-report. However, such a design was not practical given the very low base rate of suicide attempts and would necessitate more participants than is feasible for a laboratory study. Further, concerns about this retrospective report are somewhat minimized because, while suicide history was reported retrospectively, the actual communication data used for analyses were not and thus are ecologically valid and not prone to demand characteristics.
Second, classification of episode type depended on participants’ interpretations of whether their behaviors qualified as a specific type of event. Prior research has shown that single-item self-report questions can lead to misclassification (Millner et al., 2015). To overcome this potential limitation, efforts were made to ensure the language used was precise, and multiple follow-up questions were asked to assess suicidality of each episode beyond a yes/no question (e.g., asking for suicide ideation severity using a continuous scale). However, future studies could use more strictly objective measures, such as clinical charts, to categorize events (though two suicide attempts with the same level of medical severity or lethality do not necessarily entail the same extent of planning, intent, and desire to die associated with the act).
Third, a strength of this study design is its emphasis on differentiating suicide attempts from ideation given we are ultimately concerned with preventing suicidal behaviors, not just thoughts. However, logistics of accessing and downloading communication data necessitated enrolling participants with non-lethal attempts. It is possible that characteristics of suicide attempters differ from those of suicide completers, which has been borne out somewhat by prior research (e.g., DeJong, Overholser, & Stockmeier, 2010; Joiner Jr, Pettit, Walker, & Voelz, 2002). For example, suicide completers (vs. nonlethal attempters) demonstrate higher levels of perceived burdensomeness and are more likely to have experienced job and financial stress and used alcohol or drugs prior to their attempt. Further, while rates of attempt are greater for females, males are about four times more likely to die by suicide (Murphy, Xu, & Kochanek, 2013). Relatedly, we used a broad definition of what we considered a “suicide attempt episode,” including not only attempts in which some concrete action was initiated (e.g., at least one pill swallowed) but also interrupted or aborted attempts (e.g., traveled to and strongly considered jumping from a height). This approach may have changed or decreased the size of observed effects.
Fourth, the number of participants and reported episodes in the pilot study were small, providing only enough power to detect large effect sizes. In addition, the random effects structure of the mixed-effects models was elaborate to maximize generalizability of the results, but this plausibly resulted in further loss of power. Therefore, there is the increased possibility of Type II error. Even so, it is important to consider the tradeoffs associated with various research designs.
Fifth, our analyses involved running a fairly large number of tests, which potentially increases the chances of rejecting the null hypothesis due to the sheer number of tests. Given this is a pilot study with a small and already potentially underpowered sample and the fact that we did not have concrete directional hypotheses for many of the tests, we decided not to artificially suppress Type I error (i.e., using a multiple comparisons correction) but rather view any significant results in light of these caveats (Rothman, 1990). Also, using the maximal (or near-maximal) random effects in our models likely decreased the chances of Type I error given prior research that has argued that maximal random effects structures can be overly conservative and lead to a significant loss of power (Matuschek, Kliegl, Vasishth, Baayen, & Bates, 2017).
The current study utilized a novel, within-subject, laboratory-based research design to identify and better understand real-time patterns in communication unique to periods preceding suicide attempts. This is the first research study, to our knowledge, to examine the association between private text messaging data and mental health outcomes, suicide or otherwise. This laboratory investigation identified novel predictors of suicidal behaviors, which may be utilized in the future by machine learning models to predict acute suicide risk and identify whether and where an individual is on the pathway from thinking about suicide to acting on those thoughts. It is our hope that this research puts us one step closer to developing more objective, effective ways to predict and prevent future suicide-related behaviors.
Supplementary Material
Acknowledgements
We would like to thank Tara Saunders, Austin Smith, and Abbie Starns for their help with study design and data collection and members of the Teachman PACT Lab and the Barnes S2He Lab for their feedback. We would also like to thank Clay Ford and the Research Data Services team at UVA library, Courtney Soderberg and the Center for Open Science, and Eric Turkheimer for their consultation on data analytic issues. We would like to acknowledge the support and conceptual contributions by Charlene Deming, Karthik Dinakar, Adam Jaroszewski, Evan Kleiman, Alex Millner, and Matthew Nock. This research was supported by UVA Presidential Fellowships in Data Science (awarded to Jeffrey Glenn and Alicia Nobles), NIMH grant (R34MH106770), and a Templeton Science of Prospection (awarded to Bethany Teachman). Additionally, Alicia Nobles was supported by an NIH training grant (T32LM012416), and Jeffrey Glenn was supported by the Department of Veterans Affairs Office of Academic Affiliations Advanced Fellowship Program in Mental Illness Research and Treatment. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States Government.
Footnotes
Two additional specific LIWC variables – cognitive processes and social processes – were examined on an exploratory basis; results are available in Supplement 4. Additional language variables related to time orientation (e.g., focus on the future) and cognitive performance (e.g., complexity of language) were examined but are not the current focus of this paper given space constraints and the lack of prior research related to those specific constructs. Full results for these other variables are available from the first author.
A recently published study by our group utilizing this same sample and dataset focused on applying machine learning using additional language variables (Nobles, Glenn, Kowsari, Teachman, & Barnes, 2018).
References
- Ammerman BA, Kleiman EM, Uyeji LL, Knorr AC, & McCloskey MS (2015). Suicidal and violent behavior: The role of anger, emotion dysregulation, and impulsivity. Personality and Individual Differences, 79, 57–62. [Google Scholar]
- Baayen RH, Davidson DJ, & Bates DM (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. [Google Scholar]
- Barr DJ, Levy R, Scheepers C, & Tily HJ (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bates D, Maechler M, Bolker B, & Walker S (2014). lme4: Linear mixed-effects models using Eigen and S4. [Google Scholar]
- Baumeister RF (1990). Suicide as escape from self. Psychological Review, 97(1), 90–113. [DOI] [PubMed] [Google Scholar]
- Borges G, Nock MK, Abad JMH, Hwang I, Sampson NA, Alonso J, . . . Bromet E (2010). Twelve-month prevalence of and risk factors for suicide attempts in the World Health Organization World Mental Health Surveys. The Journal of Clinical Psychiatry, 71(12), 1,4781628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braithwaite SR, Giraud-Carrier C, West J, Barnes MD, & Hanson CL (2016). Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality. JMIR Mental Health, 3(2), e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown TA, Chorpita BF, & Barlow DH (1998). Structural relationships among dimensions of the DSM-IV anxiety and mood disorders and dimensions of negative affect, positive affect, and autonomic arousal. Journal of Abnormal Psychology, 107(2), 179. [DOI] [PubMed] [Google Scholar]
- Bulik CM, Carpenter LL, Kupfer DJ, & Frank E (1990). Features associated with suicide attempts in recurrent major depression. Journal of Affective Disorders, 18(1), 29–37. [DOI] [PubMed] [Google Scholar]
- Busch KA, Fawcett J, & Jacobs DG (2003). Clinical correlates of inpatient suicide. Journal of Clinical Psychiatry, 64(1), 14–19. [DOI] [PubMed] [Google Scholar]
- Cambria E, Schuller B, Xia Y, & Havasi C (2013). New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems, 28(2), 15–21. [Google Scholar]
- Centers for Disease Control and Prevention. (2014). Web-based Injury Statistics Query and Reporting System (WISQARS). Retrieved March 31, 2015, from http://www.cdc.gov/ncipc/wisqars
- Daniel SS, Goldston DB, Erkanli A, Franklin JC, & Mayfield AM (2009). Trait anger, anger expression, and suicide attempts among adolescents and young adults: A prospective study. Journal of Clinical Child & Adolescent Psychology, 38(5), 661–671. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Choudhury M, Kiciman E, Dredze M, Coppersmith G, & Kumar M (2016). Discovering shifts to suicidal ideation from mental health content in social media. Paper presented at the Proceedings of the 2016 CHI conference on human factors in computing systems. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeJong TM, Overholser JC, & Stockmeier CA (2010). Apples to oranges?: A direct comparison between suicide attempters and suicide completers. Journal of Affective Disorders, 124(1), 90–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dixon P (2008). Models of accuracy in repeated-measures designs. Journal of Memory and Language, 59(4), 447–456. [Google Scholar]
- Gould MS, Marrocco FA, Kleinman M, Thomas JG, Mostkoff K, Cote J, & Davies M (2005). Evaluating iatrogenic risk of youth suicide screening programs: A randomized controlled trial. JAMA, 293(13), 1635–1643. [DOI] [PubMed] [Google Scholar]
- Gunn JF, & Lester D (2012). Twitter postings and suicide: An analysis of the postings of a fatal suicide in the 24 hours prior to death. Suicidologi, 27(16), 42. [Google Scholar]
- Hammond KW, & Laundry RJ (2014). Application of a Hybrid Text Mining Approach to the Study of Suicidal Behavior in a Large Population. Paper presented at the 47th Hawaii International Conference on System Sciences. [Google Scholar]
- Hawkins KA, & Cougle JR (2013). A test of the unique and interactive roles of anger experience and expression in suicidality: Findings from a population-based study. The Journal of Nervous and Mental Disease, 201(11), 959–963. [DOI] [PubMed] [Google Scholar]
- Hawton K, Casanas ICC, Haw C, & Saunders K (2013). Risk factors for suicide in individuals with depression: A systematic review. Journal of Affective Disorders, 147(1–3), 17–28. [DOI] [PubMed] [Google Scholar]
- Jaeger TF (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- James G, Witten D, Hastie T, & Tibshirani R (2013). An introduction to statistical learning (Vol. 112). New York: Springer. [Google Scholar]
- Joiner TE Jr, Pettit JW, Walker RL, & Voelz ZR (2002). Perceived burdensomeness and suicidality: Two studies on the suicide notes of those attempting and those completing suicide. Journal of Social and Clinical Psychology, 21(5), 531. [Google Scholar]
- Joiner T (2005). Why People Die by Suicide. Cambridge, MA: Harvard University Press. [Google Scholar]
- Kagan V, Rossini E, & Sapounas D (2013). Sentiment Analysis for PTSD Signals: Springer. [Google Scholar]
- Kern ML, Park G, Eichstaedt JC, Schwartz HA, Sap M, Smith LK, & Ungar LH (2016). Gaining insights from social media language: Methodologies and challenges. Psychological Methods, 21(4), 507. [DOI] [PubMed] [Google Scholar]
- Kessler RC, Abelson J, Demler O, Escobar JI, Gibbon M, Guyer ME, . . . Walters EE (2004). Clinical calibration of DSM‐IV diagnoses in the World Mental Health (WMH) version of the World Health Organization (WHO) Composite International Diagnostic Interview (WMH‐CIDI). International Journal of Methods in Psychiatric Research, 13(2), 122–139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kessler RC, Warner LCH, Ivany LC, Petukhova MV, Rose S, Bromet EJ, . . . Ursano RJ (2015). Predicting US Army suicides after hospitalizations with psychiatric diagnoses in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). JAMA Psychiatry, 72(1), 49–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matuschek H, Kliegl R, Vasishth S, Baayen H, & Bates D (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. [Google Scholar]
- May AM, & Klonsky ED (2016). What Distinguishes Suicide Attempters From Suicide Ideators? A Meta‐Analysis of Potential Factors. Clinical Psychology: Science and Practice, 23(1), 5–20. [Google Scholar]
- Millner AJ, Lee MD, & Nock MK (2015). Single-item measurement of suicidal behaviors: Validity and consequences of misclassification. PloS ONE, 10(10), e0141606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Millner AJ, Lee MD, & Nock MK (2016). Describing and measuring the pathway to suicide attempts: A preliminary study. Suicide and life-threatening behavior. [DOI] [PubMed] [Google Scholar]
- Murphy SL, Xu J, & Kochanek KD (2013). Deaths: Final data for 2010. National Vital Statistics Reports, 61(4), 1–117. [PubMed] [Google Scholar]
- Nobles AL, Glenn JJ, Kowsari K, Teachman BA, & Barnes LE (2018). Identification of imminent suicide risk among young adults using text messages. Paper presented at the Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montréal, QC, Canada. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nock MK, Borges G, & Ono Y (2012). Suicide: Global Perspectives from the WHO World Mental Health Surveys: Cambridge University Press. [Google Scholar]
- Nock MK, Deming CA, Cha CB, Chiu WT, Hwang I, Sampson NA, . . . Beautrais A (2012). Sociodemographic risk factors for suicidal behavior: Results from the WHO World Mental Health Surveys Suicide: Global Perspectives from the WHO World Mental Health Surveys (pp. 86–100). [Google Scholar]
- Nock MK, Holmberg EB, Photos VI, & Michel BD (2007). Self-Injurious Thoughts and Behaviors Interview: Development, reliability, and validity in an adolescent sample. Psychological Assessment, 19(3), 309–317. [DOI] [PubMed] [Google Scholar]
- Nock MK, Hwang I, Sampson N, & Kessler RC (2010). Mental disorders, comorbidity, and suicidal behaviors: Results from the National Comorbidity Survey Replication. Molecular Psychiatry, 15(8), 868–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nock MK, Park JM, Finn CT, Deliberto TL, Dour HJ, & Banaji MR (2010). Measuring the suicidal mind: Implicit cognition predicts suicidal behavior. Psychological Science, 21(4), 511–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Onnela J-P, & Rauch SL (2016). Harnessing smartphone-based digital phenotyping to enhance behavioral and mental health. Neuropsychopharmacology, 41(7), 1691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pennebaker JW, Boyd RL, Jordan K, & Blackburn K (2015). The development and psychometric properties of LIWC2015. UT Faculty/Researcher Works. [Google Scholar]
- Pestian JP, Matykiewicz P, & Linn-Gust M (2012). What’s In a note: construction of a suicide note corpus. Biomedical Informatics Insights, 5, 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peugh JL (2010). A practical guide to multilevel modeling. Journal of School Psychology, 48, 85–112. [DOI] [PubMed] [Google Scholar]
- R Core Team. (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2013. [Google Scholar]
- Reynolds SK, Lindenboim N, Comtois KA, Murray A, & Linehan MM (2006). Risky assessments: participant suicidality and distress associated with research assessments in a treatment study of suicidal behavior. Suicide and Life-Threatening Behavior, 36(1), 19–34. [DOI] [PubMed] [Google Scholar]
- Rothman KJ (1990). No adjustments are needed for multiple comparisons. Epidemiology, 1(1), 43–46. [PubMed] [Google Scholar]
- Smith JM, Alloy LB, & Abramson LY (2006). Cognitive vulnerability to depression, rumination, hopelessness, and suicidal ideation: multiple pathways to self‐injurious thinking. Suicide and Lifethreatening behavior, 36(4), 443–454. [DOI] [PubMed] [Google Scholar]
- Sohn S, Torii M, Li D, Wagholikar K, Wu S, & Liu H (2012). A hybrid approach to sentiment sentence classification in suicide notes. Biomedical Informatics Insights, 5(Suppl 1), 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stirman SW, & Pennebaker JW (2001). Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic Medicine, 63(4), 517–522. [DOI] [PubMed] [Google Scholar]
- Thompson MP, Kaslow NJ, Short LM, & Wyckoff S (2002). The mediating roles of perceived social support and resources in the self-efficacy-suicide attempts relation among African American abused women. Journal of Consulting and Clinical Psychology, 70(4), 942. [DOI] [PubMed] [Google Scholar]
- Torous J, Kiang MV, Lorme J, & Onnela J-P (2016). New tools for new research in psychiatry: a scalable and customizable platform to empower data driven smartphone research. JMIR mental health, 3(2), e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Orden KA, Witte TK, Cukrowitz KC, Brathewaite SR, Selby EA, & Joiner TE (2010). The interpersonal theory of suicide. Psychological Review, 117, 575–600. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Venek V, Scherer S, Morency L-P, Rizzo A, & Pestian J (2014). Adolescent suicidal risk assessment in clinician-patient interaction: A study of verbal and acoustic behaviors. Paper presented at the Spoken Language Technology Workshop (SLT), 2014 IEEE. [Google Scholar]
- Walsh CG, Ribeiro JD, & Franklin JC (2017). Predicting risk of suicide attempts over time through machine learning. Clinical Psychological Science, 5(3), 457–469. [Google Scholar]
- World Health Organization. (2009). Global health risks: Mortality and burden of disease attributable to selected major risks. Geneva. [Google Scholar]
- World Health Organization. (2014). The World Health Organization world mental health composite international diagnostic interview (WHO WMH-CIDI). Retrieved on January 3rd. [Google Scholar]
- Yang H, Willis A, De Roeck A, & Nuseibeh B. (2012). A hybrid model for automatic emotion recognition in suicide notes. Biomedical Informatics Insights, 5(Suppl 1), 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.