Abstract
Background:
Language use is of increasing interest in the study of mental illness. Analytical approaches range from phenomenological and qualitative through formal computational quantitative methods. Practically, the approach may have utility in predicting clinical outcomes. We harnessed a real-world sample (blog entries) from groups with psychosis, strong beliefs, odd beliefs, illness, mental illness, and/or social isolation to validate and extend laboratory findings about lexical differences between psychosis and control subjects.
Methods:
We describe the results of two experiments using Linguistic Inquiry and Word Count (LIWC) software to assess word category frequencies. In experiment 1, we compared word use in psychosis and control subjects in the laboratory (23/group), and related results to subject symptoms. In experiment 2, we examined lexical patterns in blog entries written by people with psychosis and 8 comparison groups. In addition to between-group comparisons, we used factor analysis followed by clustering to discern the contributions of strong belief, odd belief, and illness identity to lexical patterns.
Findings:
Consistent with others’ work, we found that first person pronouns, biological process words, and negative emotion words were more frequent in psychosis language., We tested lexical differences between bloggers with psychosis and multiple relevant comparison groups. Clustering analysis revealed that word use frequencies did not group individuals with strong or odd beliefs, but instead grouped individuals with any illness (mental or physical).
Interpretation:
Pairing of laboratory and real-world samples reveals that lexical markers previously identified as specific language changes in depression and psychosis are likely markers of illness in general.
Keywords: psychosis, schizophrenia, depression, lexical analysis, self-reference
1. Introduction
Language has long been studied in schizophrenia. We and others have recently hypothesized that language features may do more than describe experience and mark thought disorder; lexical markers may report on underlying psychopathology. Here, we harnessed the power of “real-world” writing from public blogs to test the ecological validity and specificity of previously reported differences between psychosis and control subjects in laboratory samples.
1.1. Computational methods for lexical analysis can distinguish language from psychosis and control groups
Recently, computational methods have facilitated the rapid assessments of language features in large text corpora. Several groups have compared the frequencies of words or word categories in psychotic versus control subjects, or used language to classify subjects by diagnosis (Hong et al., 2015),(Strous et al., 2009),(Junghaenel, 2008), by degree of anhedonia (Buck et al., 2015a), and by social functioning and metacognition (Minor et al., 2015).
1.2. Self-referential words as markers of psychosis
Frequency of self-referential pronouns is a potential marker of psychosis. Three groups have independently identified self-referential pronouns as differing in frequency between psychotic and non-psychiatric control subjects: Strous et al. (Strous et al., 2009) in Hebrew writing samples, Hong et al. (Hong et al., 2015) in English speech samples, and Buck and Penn (Buck and Penn, 2015) in English speech samples. Furthermore, two of the three diagnostic classifiers (computational algorithms to sort cases into diagnostic categories, in this case based on word frequencies) which have been reported rely heavily on self-referential pronouns, and both achieve >75% accuracy to distinguish psychosis from control subjects (Hong et al., 2015, Strous et al., 2009).
1.3. Combining lexical features with both patient samples and innovative controls opens a new window into etiology of lexical changes in psychosis
We know that specific word frequencies differ in psychosis versus control subjects in the laboratory – we aimed to extend this finding to understand how well these observed differences generalize to language samples produced outside the lab or clinic, and what aspects of psychotic illness are associated with these changes.
We tested lexical features in two corpora, one from patient interviews and another from online blog entries. The interview corpus allowed us to replicate others’ findings and relate the observed lexical markers to specific symptoms. In particular, we tested three candidate lexical markers of psychosis suggested by others’ work: two markers of self-reference (I, biological processes) (Buck and Penn, 2015, Hong et al., 2015, Strous et al., 2009), and negative emotion (Buck et al., 2015b, Cohen et al., 2008, Cohen et al., 2009, Minor et al., 2015).
The blog corpus included carefully chosen comparison groups aimed at differentiating the experiences of mental illness, chronic illness, strong belief system, odd beliefs, and social isolation. In addition to validating lexical differences in real-world samples, we aimed to understand the dimensions of psychosis relevant to observed lexical differences.
2. Methods:
2.1. Speech study
2.1.1. Interview subjects:
This protocol was approved by the Yale Institutional Review Board, and consistent with the Declaration of Helsinki. We recruited 23 subjects with psychosis from the inpatient psychiatric hospital and our outpatient clinic. We included people currently treated for psychosis in the context of schizophrenia, schizoaffective disorder, or bipolar disorder. Hospitalized patients were admitted for current exacerbation of psychotic symptoms. Subjects with stable psychosis and 23 control subjects without any history of psychosis were identified by posters and word of mouth, and initial inclusion criteria verified by brief telephone screen. Written informed consent for all subjects was obtained by standard procedures. See subject demographics in Table 1A (summary) and Table S1 (details).
Table 1A. Interview Subject Demographics and Symptoms.
This table summarizes demographic and symptom data for subjects in the Interview Experiment. Diagnosis abbreviations: schizophrenia (scz), schizoaffective disorder (sczaff), bipolar disorder (Bipolar), psychosis NOS (NOS), depression (depr), anxiety (anx). Mean values for age, education, Scale for the Assessment of Positive Symptoms (SAPS), and Scale for the Assessment of Negative Symptoms (SANS) are presented followed by standard deviation in parentheses. P-value represents significance on the result of Student’s t-test comparing the two groups.
| Setting | Diagnoses | N | gender | age | Education (yrs) | SAPS | SANS | |
|---|---|---|---|---|---|---|---|---|
| Psychosis | 15 hosp 8 stable |
Scz-6, sczaff- 8 Bipolar-7 NOS-1 |
23 | 13M | 35.2 (10.4) | 12.6 (1.4) | 32.1 (28.7) | 22.3 (12.2) |
| Control | None-16 Depr-5 Anx-1 Bipolar-1 |
23 | 8M | 28.2 (13.7) | 14.8 (1.7) | 4.1 (6) | 2.3 (4) | |
| p value | 0.15 | 0.06 | 2.61 x10−5 | 4.24 x 10−5 | 3.23 x 10−9 |
2.1.2. Symptom Scales:
To quantify and describe psychotic symptoms in both groups, trained interviewers met with each subject to discuss psychiatric history and to administer the Standard Assessment of Positive Symptoms (SAPS) (Andreasen, 1984) and the Standard Assessment of Negative Symptoms (SANS) (Andreasen, 1983) (results Table 1A and Table S1).
2.1.3. Interview speech samples:
Interviews were digitally recorded and transcribed. Subjects responded to the prompt “We like to begin by hearing about you. Would you tell us a little about yourself?” Each was encouraged to continue talking for ten minutes. This technique is similar to that in recent reports of lexical analysis in schizophrenia (Cohen et al., 2008), (St-Hilaire et al., 2008), (Minor et al., 2015), (Buck et al., 2015a), (Buck et al., 2015b), (Strous et al., 2009), though others focused more on emotional experience (Cohen et al., 2009), (Buck and Penn, 2015), (Junghaenel, 2008), (Hong et al., 2015). This interview question was part of a larger experiment with three subsequent speech prompts and psychological tasks, which will be reported elsewhere.
2.2. Blog study
This protocol was determined by the Yale Institutional Review Board to be exempt from oversight.
2.2.1. Blog groups:
To investigate the relationship between lexical features and specific aspects of psychosis, we compared word use in Psychosis blogs to blogs from eight comparison groups. The comparison groups were blog-writers with chronic illness, mental illness, odd beliefs, strong beliefs, and social (dis)engagement (Table 1B). Each group was selected to share some, but not others, of these specific features. Each group was self-identified: the blogger declared that they belonged to the group, and there was no evidence that it was a parody. We read each blog in the Psychosis group for further evidence of psychotic illness, such as discussion of hallucinations, delusions, other symptoms, professional diagnosis, medication use, side effects, psychiatric hospitalization, legal problems, and other recovery-oriented activities (Figure 1B, more detail in Table S2).
Table 1B.
Blog groups are listed with number of entries in each group and features of interest.
| Group | n | chronic illness | mental illness | odd belief | strong belief | Pro-social | embraces group identity |
|---|---|---|---|---|---|---|---|
| Psychosis | 54 | + | + | + | + | ||
| Morgellon’s | 37 | + | + | + | + | + | + |
| Depression | 48 | + | + | ? | |||
| Aspie | 45 | + | + | + | |||
| Spinal Cord Injury | 41 | + | |||||
| Cancer | 48 | + | +/− | +/− | |||
| Evangelist | 48 | + | + | + | |||
| Anti-religion | 49 | + | +/− | + | |||
| Conspiracy | 44 | + | + | +− | + |
Figure 1. Blog authors have features of psychosis.

1A) This table reports on the Psychosis blog sample with regard to gender distribution of writers and what fraction of the sample included mention of each listed feature of psychosis. 1B) This graph reports on the density of psychosis features in the sample, such that the left-most bar shows that our sample included two blogs mentioning 1 feature, and the right-most bar shows that our sample included three blogs that mentioned 7 features.
PSYCHOSIS GROUP:
1) Psychosis (n=54): These bloggers identified as having schizophrenia or schizoaffective disorder. Their heterogeneous experiences included strong and/or odd beliefs, hallucinations, experiences of social stigma and feelings of isolation, as well as physical discomfort and sometimes a sense of pride or community. We aimed to identify language markers of these aspects of the experience of psychosis in the blogs.
COMPARISON GROUPS:
Illness Groups:
Five groups were formed using blog entries written by people with physical or psychiatric conditions. These groups varied in degree of strong or odd beliefs, physical discomfort, social isolation, and sense of community.
Morgellon’s (n=37): This group is from a largely internet-based community of people who rally around physical symptoms (skin lesions, fatigue, and overall poor health) that are very bothersome, and that they believe result from infestation or environmental exposure. Most medical experts understand this to be in the psychiatric realm, a kind of delusional parasitosis (Mortillaro et al., 2013) (Misery, 2013) (Hylwa et al., 2012) (Pearson et al., 2012). The bloggers described their distress, their failed efforts to come to common understanding with medical professionals, and relief at finding community. This group had features of strong beliefs, odd beliefs, chronic illness (diagnosis), physical discomfort (itching and pain), and sense of community (group of believers online).
Depression (n=48): People with depression, especially those with suicidality, increase self-reference and bodily focus in their language (Pennebaker, 2011). In a cognitive behavioral frame, depression is understood in terms of maladaptive beliefs, e.g. (Beck, 1992), and people with depression have been shown to exhibit depressive realism – a tendency to make more accurate, less positively-biased beliefs than control subjects (Alloy and Abramson, 1979). This group had features of social isolation, chronic illness (diagnosis), and some had physical discomfort.
Aspie (n=45): Some people with high-functioning autism spectrum disorder (previously called Asperger’s) self-identify as “Aspies”. We included this group because they are people with a chronic mental illness who have embraced this identity. We searched for blog authors who self-identified as “Aspie” – not other autism blogs.. These blogs included much speculation about what it is like for non-Aspies to emote and relate, and, for several bloggers, the relief at learning, as adults, that they are Aspies. This group has features of social isolation, chronic illness, and strong sense of community
Spinal Cord Injury (SCI, n=41): We included this group with chronic non-psychiatric illness that may be visible to others (due to gait change or wheelchair use, for example). As the symptoms and behaviors of people with psychosis can at times make them recognizable to others in the community, we expected that many people with SCI may also feel social loss due to the visibility of their condition and attendant social stigma. However, we note that among the bloggers we identified, many had dramatic stories of physical and social feats: climbing mountains, establishing social organizations, etc., after their injury and often found a sense of community online. This group had features of social isolation, chronic illness, physical discomfort, and, for some, a strong sense of community.
Cancer (n=48): We included this group because they have an uncertain prognosis in the non-psychiatric realm: diagnosis may be very upsetting, and could be both isolating (due to not feeling well, others going on with their lives) and socially supportive (strong family support,, support groups, gentle care in the hospital). Though we would not suggest that people would be happy to have cancer, many of these bloggers were proud to be survivors. In sum, this group had features of social isolation, chronic illness, physical discomfort, and strong sense of community (as above).
No Illness Groups:
Three groups were formed without illness. They were characterized by strong and/or odd beliefs, but no spoken-of physical or mental health problems.
-
6)
Evangelist (n=48): We selected this group for their strongly held beliefs that are accepted by much of society (not “odd”). They described worries and conflicts with others who don’t agree, but more of the focus was on explicating the beliefs, and on the positive affect related to participation in worldwide, local, and family groups. This group had features of strong belief (religious doctrine), social engagement, and related sense of pride/community (church, missionary work for some).
-
7)
Anti-religion (n=49): We selected this group for their strongly held beliefs that, although perhaps less popular than religious beliefs, are still broadly accepted. Many of the bloggers in this group identified as having left (mostly Judeo-Christian) groups, and many were angrily separating themselves. This group had features of strong belief, social isolation (many left religious groups), and a sense of community online.
-
8)
Conspiracy (n=44): We selected this group for their identification with strongly held beliefs that are often considered odd -- not as acceptable as Evangelist and Anti-Religion beliefs. They were notable for seeking out others with shared beliefs – many of the blogs included extensive lists of links to other similar blogs – but also for describing the groups they oppose. Many were deeply involved with seeking out evidence to corroborate their beliefs. This group had features of strong belief, odd belief, and both isolation and community due to these beliefs. Though some may have clinical diagnoses, we saw no mention of psychiatric diagnosis, specific symptom-like experiences (other than their strong/odd belief), or psychiatric treatment on the blogs we included.
2.2.2. Blog collection:
We collected the most recent 150+ word entry from blogs maintained by single authors (not edited or curated by someone else), written in English, and publicly available on the Internet. One trained study team member read each blog entry for inclusion criteria and processing. Each entry was formatted into a single plain text file using standard procedures (Pennebaker, 2007). This included correcting spelling errors when the intended word was obvious. We also removed citations and quotations longer than two sentences.
2.2.3. Lexical analysis:
Linguistic Inquiry and Word Count 2007 (LIWC) is a software program that counts the frequency of 68 particular word categories in a text corpus (Pennebaker, 2007) including function words (words that structure the sentence, such as pronouns, prepositions) and content words (words that indicate topic, such as nouns, regular verbs, and some adverbs and adjectives). For between-group comparisons, we compared the frequency of 54 categories (we excluded punctuation and some grammar super-categories, such as “function words” and “pronouns”).
2.3. Statistical analyses
2.3.1. Between-group comparisons:
For between-group comparisons, we checked that each word category was used with >1% mean frequency. For the interview sample, we used two-tailed Students’ t-tests to examine differences between the psychosis and control groups. For the blog sample, we used one-way ANOVA to examine differences between the 9 groups. For both samples, we set significance at p < 0·05 after false discovery rate (FDR) correction (Benjamini, 1995) to reduce the likelihood of Type I error from 54 simultaneous word categories tested. For the ANOVA, we used Tukey post-hoc tests for pair-wise comparisons between the Psychosis group and each of the other groups, with significance set at p < 0·05. In the interview corpus, thirty-five of the fifty-four categories we tested represented more than 1% of the words in the sample (our cut-off for inclusion in further analysis). Of these, eight categories were significantly different between groups (Table S4).
2.3.2. Correlation analyses:
Pearson’s correlations were used to test the relationships between the three LIWC categories that are candidate lexical markers of psychosis based on the previous work described in the introduction of this paper (I, Biological Processes, and Negative Emotions) and interview subject symptom scores (SAPS and SANS) with significance set at p < 0·05 (results in Table S3). We included positive symptom sub-scales for hallucinations, delusions, thought disorder, and bizarre delusions. We also included negative symptom sub-scales for anhedonia, inattention, avolition, alogia, and affective flattening.
2.3.3. Clustering:
We used an unbiased clustering approach to parse the blog entries into groups based on LIWC results. We entered the frequencies of the thirty-seven LIWC content word categories that were not umbrella categories, e.g. “see”, but not “perceptions”, into a principal components analysis to reduce them to five factors. Single blog entries were clustered based on these factor values using the “mclust” algorithm in the statistical program “R.” Mclust is a package for model-based clustering which determines the optimum number of clusters based on the Bayesian Information Criterion (BIC) (Fraley, 2012).
3. Results:
3.1. Two language samples: a laboratory sample and a real-world sample
To validate and extend the above-described previous findings about word use in the language of people with psychosis, we collected speech samples from patient and control subject interviews (details Figure 1A, Table S1) and writing samples from blogs by people with psychosis and eight comparison groups (details Figure 1B, Table S2).
3.1.1. Interview corpus
We compared demographics in the two interview subject groups (Figure 1A, Table S1). Psychosis and control subject groups were age and gender matched. The control group had on average two more years of education than the psychosis group. Psychosis group diagnoses (made by their clinicians prior to the study) included schizophrenia, schizoaffective, and bipolar disorder. As expected, our interview elicited more positive and negative symptoms from the psychosis group.
3.1.2. Blog corpus
We collected blog entries from writers with psychosis and each of eight other groups designed to test aspects of psychosis that might be associated with language changes (Figure 1B). Each writer contributed one entry to the corpus. We examined each psychosis group blog (multiple entries) for mentions of the writer having experiences consistent with psychosis. These features were present in many of the blogs (Figure 1A) and many of the blogs included multiple features (Figure 1B).
3.2. Three candidate lexical markers of psychosis
We analyzed our interview sample, in which subjects spoke for 10 minutes on the prompt “Tell me about yourself’, for putative lexical markers of psychosis (Buck et al., 2015b, Buck and Penn, 2015, Cohen et al., 2008, Cohen et al., 2009, Hong et al., 2015, Junghaenel, 2008, Minor et al., 2015, St-Hilaire et al., 2008, Strous et al., 2009) (Figure 2, black bars). First-person pronouns (Figure 2A black bars, t = 3.66, FDR adjusted p = 7·00 × 10−3) and biological process words (Figure 2B black bars, t = −4·34, FDR adjusted p = 1.67 × 10−3) were more frequent in Psychosis than Control. Negative emotion words (Figure 2C black bars, t=3.66, FDR adjusted p =0·02) were also more frequent in Psychosis than Control.
Figure 2. Candidate lexical marker frequencies in interview and blog corpora.

Mean frequencies of candidate lexical markers (I, bio, and negemo) in the interview (black bars) and blog (grey bars) corpora. Asterices denote mean frequencies that significantly differ (p<0.05). Plus signs denote mean frequencies that trend toward significant difference (p<0.1).
Furthermore, frequency of these candidate lexical markers correlated with use of the other candidate markers (Table S3A), positive symptom scores (Table S3B), and negative symptom scores (Table S3C).
3.3. Lexical patterns united bloggers with illness more than bloggers with strong or odd beliefs
In light of previous work identifying self-referential and negative emotion words as more frequent in both psychosis and depression (Bedi, 2015, Buck et al., 2015a, Buck and Penn, 2015, Hong et al., 2015, Junghaenel, 2008, Minor et al., 2015, Strous et al., 2009), we thought that these words could be markers of mental illness. However, we predicted that overall word use pattern in Psychosis would most reflect the strong odd beliefs (delusions) that often pervade the language of people with psychotic disorders. We used factor analysis followed by clustering to describe the patterns of word frequencies in the 37 thematic (non-grammar) categories. We predicted that these methods would reveal similarities between psychosis and other writers with strong odd beliefs (Conspiracy theorists, people with Morgellon’s syndrome) and to a lesser degree, writers with strong beliefs (Evangelists, Anti-religion writers). Contrary to our prediction, we found more word categories significantly differing from Psychosis in Conspiracy than any other group (Figure 3A). Factor analysis revealed that word use in the Psychosis group was more similar to that of other groups with illness than to that of non-illness groups (Figure 3B, C). To verify these groupings, we applied an unbiased clustering algorithm to the factor data. Each blog entry (without consideration of group) was assigned to a cluster: a three-cluster solution emerged. Psychosis (and other illness group) blog entries were predominantly assigned to cluster 2, whereas Conspiracy (and other strong belief) blogs were predominantly assigned to cluster 3 (Figure 3D).
Figure 3. Blog corpus thematic word frequencies and clustering results.

A) Number of word categories significantly different from Psychosis are listed for each group, B,C) Factor weights for each group are plotted, D) Number of blogs assigned to each cluster is marked for each group
3.4. The three candidate markers were increased across illness groups
To test the revised hypothesis that the lexical markers that are increased in psychosis and depression are actually markers of illness in general rather than of having strong odd beliefs or mental illness in particular, we checked the frequency of self-referential and negative emotion words in the blog corpus. They did follow the predicted pattern (Figure 2, grey bars). “I” was most frequent in psychosis, less in the Ill group, and even less in the Non-ill group (Figure 2A, grey bars. Means: 8·16%, 6·33%, 2·84%, ANOVA F=82·71 and p=7·61 x 10−31, and p<0·05 for psychosis v ill, psychosis v non-ill, and ill v non-ill). “Bio” was more frequent in psychosis and the Ill group than in the Non-ill group (Figure 2B, grey bars. Means: 3·18%, 3·33%, 2·31%, ANOVA F=10·71 and p=2·94 x 10−5, and p<0·05 for psychosis v non-ill and ill v non-ill). “Negemo” was most frequent in psychosis, less in the Ill group, and even less in the Non-ill group (Figure 2C, grey bars. Means: 2·39%, 2·25%, 1·91%, ANOVA F=3·65 and p=0·03, and p<0·1 for psychosis v non-ill, and ill v non-ill).
Furthermore, in the interview corpus, these three categories correlated with each other and with symptom burden (Table S3).
4. Discussion:
We replicated the finding that first-person pronouns are increased in Psychosis (Bedi, 2015, Buck et al., 2015a, Buck and Penn, 2015, Hong et al., 2015, Junghaenel, 2008, Minor et al., 2015, Strous et al., 2009). We found that another self-referential category, biological processes, which includes words such as body and health, was more frequent in Psychosis. Self-reference word categories such as “I”, “ingestion,” and “sex” have been previously identified as markers of depression (Oxman et al., 1988, Rude et al., 2004) and suicidality (Baddeley et al., 2011, Lester, 2009, Stirman and Pennebaker, 2001). In a corpus of published essays, we have previously found more self-referential words and fewer third-person plural words (“they”) in depression than schizophrenia (Fineberg et al., 2014).
We replicated these findings in the more ecologically valid setting of blog writing: candidate markers were increased in psychosis versus eight comparator groups (for I, negemo). However, using factor analysis and clustering, we found that word use patterns across a large set of categories do not align groups that share the presence of strongly held or odd beliefs. Rather patterns are most similar across groups with illness. Furthermore, in the interview sample, the three candidate lexical markers correlated with symptom intensity, and each marker was more frequent in the illness than non-illness blogs. Future work will determine if more symptomatic patients with other illnesses also use more of these lexical features
Our finding that frequencies of these three lexical markers are positively correlated with one another in our interview subjects could suggest that these lexical markers relate to some common illness-associated process in these subjects.
4.1. Self-reference and illness experience
Illness is attended by shifts in self-concept and discomforts that pull one’s attention inward. An fMRI marker of psychosis, dorsolateral pre-frontal cortex (DLPFC) activation, also correlates with distress (Corlett and Fletcher, 2012). Furthermore, belief-associated distress separates people with schizophrenia from others with high magical ideation, e.g. conspiracy theorists (Swami et al., 2010), and predicts delusion-motivated violence (Coid et al., 2013).
The similarity in word use among the ill groups in our sample also accords with research in neuro-immunology. Circulating inflammatory markers can promote “sickness behaviors”, a syndrome of low energy and depressed mood quite similar to Major Depression (Dantzer et al., 2008). Psychiatric symptoms are also linked to interferon-alpha treatment of Hepatitis C (Smith et al., 2011), post-coronary recovery period (Lafitte et al., 2015), and chronic gut inflammation (Zonis et al., 2015). Inflammation has also been implicated in PTSD (O’Donovan et al., 2015) and schizophrenia (M et al., 2015). We are not aware of any studies directly testing the relationship between inflammatory markers and word use patterns, but future studies relating language markers to inflammatory processes in psychiatric syndromes may help target therapy.
4.2. Affective words
Previous data on affect word use in Psychosis are mixed. Some have found more negative emotion words in Psychosis versus control, specifically in discussion of neutral/unpleasant vs. pleasant topics (Cohen et al., 2009). In Psychosis with anhedonia, both decreased (Cohen et al., 2008) and increased use have been documented (Cohen et al., 2009). One study found that patients with anhedonia use more negative emotion words when prompted to talk about pleasant experiences (Cohen et al., 2009)! However, Junghaenel et al. found no relationship between affect word use and anhedonia (Buck et al., 2015b), and they and others have reported no significant difference in negative emotion words in schizophrenia versus control subjects (Junghaenel, 2008) (St-Hilaire et al., 2008). Two groups have reported no differences between groups in positive emotion word use (Cohen et al., 2009), (Cohen et al., 2008), but Junghaenel et al. found fewer positive emotion words in Schizophrenia versus control subjects (F=4.9, p<0.05) (Junghaenel, 2008). In our speech samples, negative emotion words were more frequent in Psychosis than Control, and positive emotion words did not differ.
As our sample seems to have included more acutely psychotic subjects (many of our subjects were interviewed during psychiatric hospital admissions, whereas other studies included mostly stable outpatients), we wondered whether negative emotion words might correlate with symptom burden, perhaps explaining differences between reported results. Others have found that emotion word use in schizophrenia but not control subjects was positively correlated to trait anxiety (St-Hilaire et al., 2008). In another study, negative emotion word frequency correlated to PANSS score (Minor et al., 2015), and to the global affective flattening subscale of the SANS (not the overall SANS score) (Cohen et al., 2008). Positive emotion words did not correlate to SAPS or SANS (Cohen et al., 2008) or PANSS (Minor et al., 2015). Positive but not negative emotion word frequency correlated with social functioning (Cohen et al., 2008). Here, we found that frequency of negative emotion words was positively correlated with positive and negative symptom burden (Table S3). This, together with the positive correlations between self-referential/social words and symptom scores suggests that these word categories could be markers of the intensity of symptoms.
4.3. Study limitations:
Though our interview sample is matched on age and gender, we note that the discrepancy in education may reflect not only differences in years in the classroom, but also the vulnerable age when many of our psychosis subjects may have had education interrupted by symptom onset. By contrast, many of our control subjects are students with trajectory is toward yet more education.
Language collected from blogs also has important limitations: most critically, we can’t verify the identity or affiliations of the authors. Without meeting these writers or having access to their psychiatric histories, we cannot be certain of how well each blogger fits into their assigned group, and how many might possibly fit into another group as well. This does increase the potential for greater heterogeneity in the samples. However, we did find mention of experiences consistent with having a psychotic disorder in most of the psychosis blogs (Figure 1). Furthermore, we did see similar word categories differ in our Interview sample (including many acutely ill patients). Also, writers with diagnosed conditions may be a subset of those with illness: those doing well enough to engage in the blogging project. Our correlation results from the Interview experiment do, however, suggest that self-referential and negative emotion words actually correlate with more not fewer symptoms.
Lastly, Conspiracy, Anti-religion, and Evangelist bloggers may write more about strong belief than about themselves, which could account for some of the observed differences. However, considering these data in the context of interview studies (ours and others’) suggests that the differences we observed are in fact quite similar to those seen in Psychosis and Control subjects responding to the same prompts. Nonetheless, future studies to examine language from people talking about strong beliefs or other kinds of prompts will be important.
5. Conclusions:
The similarities we have observed in lexical markers across mental and physical illnesses reinforce the importance of considering common mechanism as we describe the pathophysiology of mental illness. This work does not allow us to define causality (this would be better understood in a prospective repeated-measures design). Future work will help us to understand the relative contributions of the social and biological aspects of illness to changes in lexical markers. In particular lexical analyses may be of interest as novel therapies are introduced, word use metrics can be simply (and cheaply) tracked through illness onset, recovery and relapse across diagnoses. Based on our present findings, word use may represent an assay of current clinical status, distress and discomfort, regardless of specific symptoms and diagnosis.
Supplementary Material
Table S1. Detailed interview subject information
Table S2. Detailed blog subject information
Table S3 Word category and symptom correlations
Table S4. Interview corpus statistics for between-group comparisons
Acknowledgments
Funding: This work was supported by the Connecticut Mental Health Center (CMHC) and Connecticut State Department of Mental Health and Addiction Services (DMHAS). PRC was funded by an IMHRO / Janssen Rising Star Translational Research Award and CTSA Grant Number UL1 TR000142 from the National Center for Research Resources (NCRR) and the National Center for Advancing Translational Science (NCATS), components of the National Institutes of Health (NIH), and NIH roadmap for Medical Research. SKF was supported by NIMH Grant #- 5T32MH019961, “Clinical Neuroscience Research Training in Psychiatry,” and by a NARSAD Young Investigator Grant. The contents of this work are solely the responsibility of the authors and do not necessarily represent the official view of NIH or the CMHC/DMHAS.
References:
- Alloy LB & Abramson LY (1979). Judgment of contingency in depressed and nondepressed students: sadder but wiser? J Exp Psychol Gen 108, 441–85. [DOI] [PubMed] [Google Scholar]
- Andreasen NC (1983). The Scale for the Assessment of Negative Symptoms (SANS). The University of Iowa: Iowa City, IA. [Google Scholar]
- Andreasen NC (1984). The Scale for the Assessment of Positive Symptoms (SAPS). The University of Iowa: Iowa City, IA. [Google Scholar]
- Baddeley JL, Daniel GR & Pennebaker JW (2011). How Henry Hellyer’s use of language foretold his suicide. Crisis 32, 288–92. [DOI] [PubMed] [Google Scholar]
- Beck AT (1992). Cognitive Therapy - a 30 Year Retrospective. Which Psychotherapies in Year 2000 ? 6, 13–28. [Google Scholar]
- Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, Ribeiro S, Javitt DC, Copelli M, Corcoran CM (2015). Automated analysis of free speech predicts psychosis onset in high-risk youths. NPJ Schizophrenia 1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society 57, 289–300. [Google Scholar]
- Buck B, Minor KS & Lysaker PH (2015a). Differential lexical correlates of social cognition and metacognition in schizophrenia; a study of spontaneously-generated life narratives. Compr Psychiatry 58, 138–45. [DOI] [PubMed] [Google Scholar]
- Buck B, Minor KS & Lysaker PH (2015b). Lexical Characteristics of Anticipatory and Consummatory Anhedonia in Schizophrenia: A Study of Language in Spontaneous Life Narratives. J Clin Psychol 71, 696–706. [DOI] [PubMed] [Google Scholar]
- Buck B & Penn DL (2015). Lexical Characteristics of Emotional Narratives in Schizophrenia: Relationships With Symptoms, Functioning, and Social Cognition. J Nerv Ment Dis 203, 702–708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen AS, Alpert M, Nienow TM, Dinzeo TJ & Docherty NM (2008). Computerized measurement of negative symptoms in schizophrenia. J Psychiatr Res 42, 827–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen AS, St-Hilaire A, Aakre JM & Docherty NM (2009). Understanding anhedonia in schizophrenia through lexical analysis of natural speech. Cognition & Emotion 23, 569–586. [Google Scholar]
- Coid JW, Ullrich S, Kallis C, Keers R, Barker D, Cowden F & Stamps R (2013). The relationship between delusions and violence: findings from the East London first episode psychosis study. JAMA Psychiatry 70, 465–71. [DOI] [PubMed] [Google Scholar]
- Corlett PR & Fletcher PC (2012). The neurobiology of schizotypy: fronto-striatal prediction error signal correlates with delusion-like beliefs in healthy people. Neuropsychologia 50, 3612–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dantzer R, O’Connor JC, Freund GG, Johnson RW & Kelley KW (2008). From inflammation to sickness and depression: when the immune system subjugates the brain. Nat Rev Neurosci 9, 46–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fineberg SK, Deutsch-Link S, Ichinose M, McGuinness T, Bessette AJ, Chung CK & Corlett PR (2014). Word use in first-person accounts of schizophrenia. Br J Psychiatry. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fraley C, Raftery AE, Murphy TB, Scrucca L (2012). mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation. Department of Statistics, University of Washington: Seattle, WA. [Google Scholar]
- Hong K, Nenkova A, March ME, Parker AP, Verma R & Kohler CG (2015). Lexical use in emotional autobiographical narratives of persons with schizophrenia and healthy controls. Psychiatry Res 225, 40–9. [DOI] [PubMed] [Google Scholar]
- Hylwa SA, Foster AA, Bury JE, Davis MD, Pittelkow MR & Bostwick JM (2012). Delusional infestation is typically comorbid with other psychiatric diagnoses: review of 54 patients receiving psychiatric evaluation at Mayo Clinic. Psychosomatics 53, 258–65. [DOI] [PubMed] [Google Scholar]
- Junghaenel DU, Smyth JM, Santner L (2008). Linguistic Dimensions of Psychopathology: A Quantitative Analysis. Journal of Social and Clinical Psychology 27, 36–55. [Google Scholar]
- Lafitte M, Tastet S, Perez P, Serise MA, Grandoulier AS, Aouizerate B, Sibon I, Capuron L & Couffinhal T (2015). High sensitivity C reactive protein, fibrinogen levels and the onset of major depressive disorder in post-acute coronary syndrome. BMC Cardiovasc Disord 15, 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lester D (2009). Learning about suicide from the diary of Cesare Pavese. Crisis 30, 222–4. [DOI] [PubMed] [Google Scholar]
- M F, J AM, M A, R R, C FA, P A, C L & L B (2015). Quality of life is associated with chronic inflammation in schizophrenia: a cross-sectional study. Sci Rep 5, 10793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Minor KS, Bonfils KA, Luther L, Firmin RL, Kukla M, MacLain VR, Buck B, Lysaker PH & Salyers MP (2015). Lexical analysis in schizophrenia: how emotion and social word use informs our understanding of clinical presentation. J Psychiatr Res 64, 74–8. [DOI] [PubMed] [Google Scholar]
- Misery L (2013). [Morgellons syndrome: a disease transmitted via the media]. Ann Dermatol Venereol 140, 59–62. [DOI] [PubMed] [Google Scholar]
- Mortillaro G, Rodgman C, Kinzie E & Ryals S (2013). A case report highlighting the growing trend of Internet-based self-diagnosis of “Morgellon’s disease”. J La State Med Soc 165, 334–6. [PubMed] [Google Scholar]
- O’Donovan A, Chao LL, Paulson J, Samuelson KW, Shigenaga JK, Grunfeld C, Weiner MW & Neylan TC (2015). Altered inflammatory activity associated with reduced hippocampal volume and more severe posttraumatic stress symptoms in Gulf War veterans. Psychoneuroendocrinology 51, 557–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oxman TE, Rosenberg SD, Schnurr PP & Tucker GJ (1988). Diagnostic classification through content analysis of patients’ speech. Am J Psychiatry 145, 464–8. [DOI] [PubMed] [Google Scholar]
- Pearson ML, Selby JV, Katz KA, Cantrell V, Braden CR, Parise ME, Paddock CD, Lewin-Smith MR, Kalasinsky VF, Goldstein FC, Hightower AW, Papier A, Lewis B, Motipara S, Eberhard ML & Unexplained Dermopathy Study T (2012). Clinical, epidemiologic, histopathologic and molecular features of an unexplained dermopathy. PLoS One 7, e29908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pennebaker JW (2011). The secret life of pronouns : what our words say about us. Bloomsbury Press: New York. [Google Scholar]
- Pennebaker JW, Chung CK, Ireland M, Gonzales A, & Booth RJ Austin, TX (www.liwc.net) (2007). The development and psychometric properties of LIWC2007. [Software manual]. Austin, TX. [Google Scholar]
- Rude SS, Gortner EM & Pennebaker JW (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion 18, 1121–1133. [Google Scholar]
- Smith KJ, Norris S, O’Farrelly C & O’Mara SM (2011). Risk factors for the development of depression in patients with hepatitis C taking interferon-alpha. Neuropsychiatr Dis Treat 7, 275–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- St-Hilaire A, Cohen AS & Docherty NM (2008). Emotion word use in the conversational speech of schizophrenia patients. Cogn Neuropsychiatry 13, 343–56. [DOI] [PubMed] [Google Scholar]
- Stirman SW & Pennebaker JW (2001). Word use in the poetry of suicidal and nonsuicidal poets. Psychosom Med 63, 517–22. [DOI] [PubMed] [Google Scholar]
- Strous RD, Koppel M, Fine J, Nachliel S, Shaked G & Zivotofsky AZ (2009). Automated characterization and identification of schizophrenia in writing. J Nerv Ment Dis 197, 585–8. [DOI] [PubMed] [Google Scholar]
- Swami V, Chamorro-Premuzic T & Furnham A (2010). Unanswered Questions: A Preliminary Investigation of Personality and Individual Difference Predictors of 9/11 Conspiracist Beliefs. Applied Cognitive Psychology 24, 749–761. [Google Scholar]
- Zonis S, Pechnick RN, Ljubimov VA, Mahgerefteh M, Wawrowsky K, Michelsen KS & Chesnokova V (2015). Chronic intestinal inflammation alters hippocampal neurogenesis. J Neuroinflammation 12, 65. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Detailed interview subject information
Table S2. Detailed blog subject information
Table S3 Word category and symptom correlations
Table S4. Interview corpus statistics for between-group comparisons
