Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Nov 1.
Published in final edited form as: J Psychopathol Clin Sci. 2023 Jul 20;132(8):972–983. doi: 10.1037/abn0000850

Depression and Anxiety Have Distinct and Overlapping Language Patterns: Results from a Clinical Interview

Elizabeth C Stade 1, Lyle Ungar 2, Johannes C Eichstaedt 3, Garrick Sherman 4, Ayelet Meron Ruscio 1
PMCID: PMC10799169  NIHMSID: NIHMS1915782  PMID: 37471025

Abstract

Depression has been associated with heightened first-person singular pronoun use (I-usage; e.g., “I,” “my”) and negative emotion words. However, past research has relied on nonclinical samples and nonspecific depression measures, raising the question of whether these features are unique to depression vis-à-vis frequently co-occurring conditions, especially anxiety. Using structured questions about recent life changes or difficulties, we interviewed a sample of individuals with varying levels of depression and anxiety (N = 486), including individuals in a major depressive episode (n = 228) and/or diagnosed with generalized anxiety disorder (n = 273). Interviews were transcribed to provide a natural language sample. Analyses isolated language features associated with gold standard, clinician-rated measures of depression and anxiety. Many language features associated with depression were in fact shared between depression and anxiety. Language markers with relative specificity to depression included I-usage, sadness, and decreased positive emotion, while negations (e.g., “not,” “no”), negative emotion, and several emotional language markers (e.g., anxiety, stress, depression) were relatively specific to anxiety. Several of these results replicated using a self-report measure designed to disentangle components of depression and anxiety. We next built machine learning models to detect severity of common and specific depression and anxiety using only interview language. Individuals’ speech characteristics during this brief interview predicted their depression and anxiety severity, beyond other clinical and demographic variables. Depression and anxiety have partially distinct patterns of expression in spoken language. Monitoring of depression and anxiety severity via language can augment traditional assessment modalities and aid in early detection.

Keywords: depression, anxiety, computational linguistics, natural language processing, machine learning

General Scientific Summary

There is growing scientific excitement about detecting depression from people’s language use, but this work rarely accounts for anxiety, which overlaps substantially and co-occurs frequently with depression. Using clinical interviews with individuals with varying levels of depression and anxiety, we found that some language patterns are shared by these conditions, whereas other patterns distinguish them. Depressed individuals show more I-usage (e.g., “I,” “me,” “my”) and sadness words (e.g., “low,” “sad,” “alone”), while anxious individuals use a much broader array of negative emotionality language (e.g., anxiety, stress, and counterintuitively, depression), raising implications for the understanding and automatic assessment of these conditions.


Much remains unknown about the nature of emotional disturbance. A major challenge is the covert nature of emotional experiences and the corresponding need for methods capable of revealing them. In recent years, computational linguistics has emerged as a promising method for illuminating psychological constructs (Kern et al., 2016). Computational linguistics integrates techniques from linguistics, cognitive science, and artificial intelligence to enable automated processing and analysis of human language. By detecting often subtle patterns in natural language that can reveal information about people’s thoughts and feelings (Kern et al., 2016), computational linguistics may offer a window into internal experiences (Tausczik & Pennebaker, 2010) and their disruption in emotional disorders.

Reflecting the promise of this method, a growing literature has examined the language of depression. This work has uncovered a consistent relationship between depression and first-person singular pronouns (“I-usage”; Edwards & Holtzman, 2017), suggesting increased self-focus (Ingram, 1990) or self-immersed perspective taking (Kross & Ayduk, 2011). Depression is also associated with negative emotion language (e.g., Eichstaedt et al., 2018), likely reflecting heightened levels of negative affect. The ability to index depression via language has generated considerable excitement, as it raises the possibility of automated, unobtrusive detection of depression at scale.

At present, however, this literature has significant limitations. Many studies have used unselected, convenience, or general treatment-seeking samples with low or unknown rates of clinically significant depression. Studies have also relied on self-report measures, despite concerns that these measures capture general distress rather than depression specifically, especially in nonclinical samples (Coyne, 1994; Kendall et al., 1987). Some studies have assumed depression diagnosis based on membership in online mental health forums or self-declared diagnosis on Twitter, allowing researchers to amass large datasets but raising questions about whether the results validly represent depression. A related question has concerned whether language features previously linked to depression are unique to this construct or more accurately reflect a higher-order factor like negative emotionality (Tackman et al., 2019). Failure to compare depression to other disorders raises the possibility that language features attributed to depression are driven by co-occurring conditions. Resultingly, machine learning models of depression, built from language correlates of depression without considering their correlations with other disorders (Guntuku et al. 2017), risk misclassifying individuals.

Anxiety, in particular, co-occurs frequently with depression and has overlapping symptoms. This suggests that some features previously associated with depression may be better understood as shared with, or even resulting from, anxiety. Consistent with this possibility, studies have found anxiety, like depression, to be associated with I-usage and negative emotion words (Brockmeyer et al., 2015; Dirkse et al., 2015; Sonnenschein et al., 2018). Research on the language of anxiety, however, has been hindered by methodological limitations similar to those for depression. Additionally, although anxiety takes many forms, most studies have used general measures of anxiety that don’t distinguish between different forms. Few studies have focused on generalized anxiety disorder (GAD), which shares the strongest relationship with depression and represents its most challenging boundary condition (Watson, 2009). This close relationship makes GAD an especially good comparison condition for isolating language correlates of depression. However, the few studies that have examined language in GAD used unvalidated measures of the disorder (Dirkse et al., 2015) or tested different language features than those examined for depression, making comparison difficult (Geronimi & Woodruff-Borden, 2015).

There is a need for comprehensive language studies that compare depression and anxiety directly to distinguish their common and specific features. Uncovering common features informs parsimonious theoretical models and transdiagnostic interventions that benefit most patients. By contrast, identifying specific features is necessary for establishing discriminant validity (Hubley & Zumbo, 1996) and can inform development of targeted treatments. To date, however, almost no work has attempted to disentangle language features unique to depression vis-à-vis anxiety. We are aware of only two studies that examined depression and anxiety side-by-side (Brockmeyer et al., 2015; Sonnenschein et al., 2018); they tested a single language feature (I-usage) and were notably underpowered for these analyses.

The current study took a rigorous approach to examining the language of depression and anxiety. We used a well-characterized sample including individuals with major depressive disorder (MDD) and generalized anxiety disorder (GAD) as well as individuals falling below the threshold for diagnosis. We used measures of depression and anxiety severity that were clinician-rated to be specific to each disorder, which offer superior discrimination between constructs compared to self-report measures (Clark & Watson, 1991). Additionally, we sought to replicate our results using a self-report measure specifically designed to separate shared from unique symptoms of depression and anxiety. Subscales measure symptoms at broad, intermediate, and narrow levels of specificity, ranging from symptoms largely overlapping between depression and anxiety, to symptoms relatively nonspecific to depression vis-à-vis anxiety (and vice versa), to symptoms unique to depression via-a-vis anxiety (and vice versa).

Following Kazak (2018), we posed primary and secondary hypotheses. Primary hypotheses tested language features that mapped most directly onto common and specific factors identified in the literature, while secondary hypotheses tested related but less central features. As depression and anxiety have each been linked to I-usage, we posed the primary hypothesis that their common factor would be as well. Additionally, as negative emotionality is heightened in emotional disorders (Barlow et al., 2014), we hypothesized that their common factor would be associated with negative emotion language, broadly defined (primary), as well as with stress, depression, sadness, anxiety, and anger language (secondary). As existing lexica were not developed to capture unique features of depression and anxiety, we expected that they would be dominated by language representing general distress, which is identified as the common factor of depression and anxiety in Clark and Watson’s influential tripartite model (1991) and its successors (Mineka et al., 1998; Watson, 2009). However, we acknowledged the plausible rival hypotheses that depression/sadness language would be associated specifically with depression, and anxiety language would be associated specifically with anxiety.

We also expected to observe specific language features of depression and anxiety. Based on theorizing that deficits in positive affect are relatively unique to depression (Clark & Watson, 1991), we hypothesized that depression would be negatively associated with reward and positive emotion language (primary) and leisure language (secondary). Conversely, given the central preoccupation with threat in anxiety disorders (Craske et al., 2009) and work identifying somatic anxiety as unique to these disorders (Mineka et al., 1998), we expected anxiety to be positively associated with risk and fear language (primary) and physiological sensation language (secondary).

In addition to these planned analyses, we conducted exploratory analyses. Exploratory analyses examined a wider range of language features, using a data-driven regression framework to search for novel features of depression and anxiety. Lastly, we built language-based models of depression and anxiety. Such models are trained using language features to predict a psychological construct; in subsequent applications, the model can be used to estimate a person’s score on that construct based on language alone. As existing models of depression have not taken specificity into account, we sought to develop language-based models of depression and anxiety that maximized the discriminability of these constructs.

Method

Participants

Participants were recruited from the Philadelphia community via online and in-person advertisements. Participants were screened to ensure they were at least 18 years old and either (a) experiencing symptoms of MDD and/or GAD or (b) had no psychopathology. Psychotropic medication was permitted at a stable dose. Individuals reporting heavy substance use or active psychosis were excluded. Eligible participants were invited to the lab and administered the Anxiety and Related Disorders Interview Schedule for DSM-5–Lifetime Version (ADIS-5L; Brown & Barlow, 2014) by a Master’s- or Bachelor’s-level clinical interviewer trained to a high level of reliability with an expert rater.

To enroll a sample that varied widely in depression and anxiety, all individuals who spoke at least 200 words in the ADIS-5L Introduction section and completed the ADIS-5L MDD module were eligible to participate. This yielded a mixed sample of 486 participants with and without psychopathology, of whom 167 were currently in a major depressive episode (MDE) and diagnosed with GAD, 106 currently had GAD without MDE, 61 currently had MDE without GAD, and 152 had neither disorder. Participants were primarily female (65%) and ranged in age from 18 to 80 (M = 32.89, SD = 12.83). The sample was racially diverse: 56% of participants were White, 26% were Black, 7% were Asian, and 11% were a different race; 8% reported their ethnicity as Hispanic/Latinx. Fifty-eight percent of participants completed college, and household income ranged from $0 to $500,000 (Mdn = $32,000, SD = $42,687).

A subset of participants returned to the lab and completed a self-report measure of depression and anxiety. This group included 239 individuals whose principal (most severe) disorder was MDD or GAD (n = 184) or who had no lifetime psychopathology (n = 55). This subsample included fewer Black participants than the total sample (χ2 = 17.21, p = .001) but otherwise did not differ in sex, age, ethnicity, educational attainment, or household income, all p > .086.

Measures

Clinician-Assessed Depression and Anxiety

Interviewers assessed the presence and severity of depression and anxiety using the ADIS-5L. Each participant was assigned separate clinical severity ratings for major depression and GAD—which were used to operationalize depression and anxiety, respectively—using a 0 (none) to 8 (very severely disturbing disabling) scale, with ratings of 4 (moderate) or higher indicating a clinically significant disorder. Diagnostic decisions and clinical severity ratings for each participant were finalized in weekly meetings of the assessment team. The final depression and anxiety severity variables were strongly related but not entirely overlapping (r = .55, p <.001). Interrater reliability was excellent for depression and anxiety diagnostic status (K = 0.88 and 1.00, respectively) and severity (both ICC = .95) based on blind, independent ratings of recorded interviews (n = 32) from ongoing studies with these populations in our lab. As shown in the online supplement (see Table S1), participants with major depression had the highest depression severity, whereas participants with GAD had the highest anxiety severity and the highest total number of comorbid disorders.

Self-Reported Depression and Anxiety

A subset of participants completed the self-report Mood and Anxiety Symptom Questionnaire (MASQ; Watson et al., 1995a, 1995b), which contains five subscales for measuring symptoms at three levels of specificity. At the broadest level is General Distress: Mixed Symptoms, which contains symptoms shared by depressive and anxiety disorders (e.g., insomnia, fatigue). At the intermediate level are General Distress: Depressive Symptoms and General Distress: Anxious Symptoms, which reflect relatively nonspecific symptoms of depressive disorders (e.g., depression, hopelessness) and anxiety disorders (e.g., nervousness, tension), respectively. At the highest level of specificity are Anhedonic Depression, which reflects anhedonia and low positive affect, and Anxious Arousal, which reflects somatic tension and hyperarousal. These symptoms are considered relatively specific to each construct. In our sample, the MASQ subscales had excellent reliability (Cronbach’s α = .88–.95) and were moderately to strongly correlated (r = .42–.83, all p < .001).

Procedure

All procedures were approved by the University of Pennsylvania Institutional Review Board. Prior to any diagnostic assessment, the ADIS-5L interview began with an Introduction section containing open-ended questions about recent life changes or difficulties. First, participants were asked: “I would like to get a general idea of what sorts of problems you have been having recently. What have they been?” This was followed by: “What would you say is the main thing that is bothering you right now?” Participants subsequently were asked about stressors in each of several life domains, including family, social life, romantic relationships, work/school, finances, health, and legal matters (e.g., “In the past year, have you had any changes in or difficulties with… family?”). Two follow-up questions asked about employment or schooling (“What kind of work/schooling are you in now?” “What are your short-term educational or employment goals?”). These two questions, which were included for a separate aim of the parent project, had the added benefit of increasing participant word count as well as content coverage of this life domain. Participants spoke an average of 897 words (SD = 774, range = 202–6,046) in the Introduction section of the interview. Demographic correlates of word count are presented in the online supplement.

After this section, participants proceeded with the rest of the ADIS-5L interview, completing the MDD and GAD sections and continuing until their eligibility for the parent study was determined. Participants who were diagnosed with MDD or GAD or who screened negative for lifetime psychopathology on the ADIS-5L returned to the lab approximately three weeks later and completed the MASQ.

Data Processing

Trained research assistants blind to participants’ clinical status transcribed the audio recordings of the ADIS-5L Introduction section using XTrans software (Glenn et al., 2009). Transcription was carried out in accordance with a transcription protocol developed by the Linguistic Data Consortium (LDC) at the University of Pennsylvania and adapted for this project with guidance by the LDC. For each participant, we produced a verbatim transcript of all participant and interviewer speech, marking difficult-to-decipher transcript regions. A second independent transcriber listened to each audio recording, making corrections to the transcript as needed and paying particular attention to difficult-to-decipher regions. When participants’ enunciation or audio quality made transcription more challenging, a third independent transcriber performed an extra check of the transcription. The transcribing team, led by the first author, met weekly to prevent transcriber drift from this protocol.

Next, we converted the transcribed speech into variables (features) for use in statistical analyses. We extracted lexicon-based assessments of language features (e.g., I-usage, negative emotion, sadness, anxiety1 words), language-based model estimates of psychological traits (depression, anxiety, stress, loneliness, anger, locus of control), and meta-language features (total words, average word length) from participants’ transcribed language. Further details about the linguistic feature extraction process appear in the online supplement.

Statistical Analysis

We performed analyses using R, version 4.0.3. (R Core Team, 2020) and the Differential Language Analysis ToolKit, version 1.2.6 (Schwartz et al., 2017). We created figures using the radarchart function from the fmsb package in R (version 0.7.3, Nakazawa, 2022). For core correlational and prediction analyses using clinical ratings, we applied a minimum threshold of 200 words to balance recommendations in the literature (Kern, 2016) while retaining the largest sample possible (N = 486). For MASQ correlational analyses, we used a threshold of 100 words to maximize power and increase sensitivity to detect effects given the smaller sample (n = 241).2

Correlational Analyses

We used ordinary least squares regression to quantify the relationships of our depression and anxiety constructs with each language feature, reporting effects as standardized beta weights (β), which can be interpreted analogously to Pearson’s r. In initial exploratory analyses, we performed separate, parallel analyses for depression and anxiety. In subsequent hypothesis tests, we included depression and anxiety within the same analyses to isolate their effects. To obtain a measure of variance shared by depression and anxiety, we performed a principal component analysis using the depression and anxiety severity ratings and assigned each participant a score reflecting their value on the first principal component. To obtain measures of variance specific to depression, we examined language features associated with depression severity rating, controlling for anxiety severity rating as a covariate (and vice versa).

Effect sizes between depression or related constructs and language features are typically in the range of r = .1 to .2 (Edwards & Holtzman, 2017). Using the pwr.r.test function from the pwr package in R (version 1.3.0, Champely, 2020), power was .91 to detect an effect of r = .15 using a sample as large as ours. To maximize power for primary hypotheses, we applied the standard p < .05 threshold without correction for multiple comparisons. For secondary hypotheses and exploratory analyses, we applied Benjamini-Hochberg (1995) false discovery rate correction. As language use varies by age and sex (Schwartz et al., 2013), we controlled for these variables in all analyses.3

Language-Based Prediction

Finally, we used machine learning to build predictive models of specific depression and anxiety, as well as their common factor, to examine how much variance in these constructs can be explained collectively by language features. We used elastic net regression models, which perform regularization and feature selection (Zou & Hastie, 2005), allowing us to evaluate all abovementioned language features as predictors. We used 10-fold cross validation, repeatedly dividing our data into a set on which the model was trained (comprising 90% of the data) and a set on which the model was tested (comprising 10% of the data). R2 values convey the average variance in depression or anxiety predicted by language features alone across 10 repetitions of cross validation.

Transparency and Openness

We report how we determined our sample size and all data exclusions, and we follow JARS (Kazak, 2018). Preregistration and analysis code are available at https://osf.io/95nhj/. The ADIS-5L and MASQ cannot be made publicly available as they are under copyright. Data for this study are not publicly available due to the sensitive nature of the language provided by participants; a de-identified data set is available by contacting the corresponding author.

Results

Language Features of Depression and Anxiety

We began by examining how clinician-rated depression and anxiety severity related to participants’ language during the interview. In these initial exploratory analyses, we examined the main effects of depression and anxiety without testing specificity vis-à-vis each other. Results, as well as the words from each language feature that appeared most often in our dataset, are presented in Table 1.

Table 1.

Common and Specific Language Features of Clinician-Assessed Depression and Anxiety Severity

DEP
Specific DEP
ANX
Specific ANX
Shared
DEP and ANX
Language Feature Words β 95% CI β 95% CI β 95% CI β 95% CI β 95% CI
PERCEPTUAL PROCESSES say, feel, said, see, hard, looking, feeling, felt, pain, look 0.12 [0.04, 0.21] - - 0.13 [0.04, 0.21] - - 0.15 [0.06, 0.23]
FEEL feel, hard, feeling, felt, pain, weight, feels 0.16 [0.07, 0.24] - - 0.12 [0.03, 0.20] - - 0.16 [0.07, 0.24]
joy (with, life, family, yes, year, day, good, health, work, things) −0.13 [−0.21, −0.21] - - −0.13 [−0.22, −0.04] - - −0.15 [−0.24, −0.06]
NEGATIVE EMOTION anxiety, bad, difficult, lost, pain, difficulties, sorry, sick, weird, worse 0.13 [0.04, 0.21] - - 0.23 [0.14, 0.31] 0.16 [0.08, 0.25] 0.20 [0.11, 0.28]
Depression score (um, stress, it’s, depression, i’m, anxiety, social, don’t, feel, hm) 0.12 [0.03, 0.2] - - 0.21 [0.12, 0.29] 0.15 [0.06, 0.24] 0.17 [0.09, 0.26]
Stress score - 0.16 [0.08, 0.25] - - 0.20 [0.11, 0.28] 0.12 [0.03, 0.21] 0.19 [0.10, 0.27]
Anxiety score (stress, i’m, lot, um, anxiety, feel, there's, year, fine, anxious) - - - - - - 0.11 [0.02, 0.20] - -
valence (know, right, mom, live, sure, great, fine, education, happy, goal) - - - - −0.14 [−0.22, −0.05] −0.13 [−0.21, −0.04] - -
ANXIETY anxiety, worry, anxious, worried - - −0.11 [−0.20, −0.02] 0.22 [0.13, 0.30] 0.21 [0.12, 0.29] 0.13 [0.04, 0.21]
fear (uh, anxiety, stressful, job, stress, going, for, find, get, out) - - −0.11 [−0.2, −0.02] - - - - - -
POSITIVE EMOTION kind, well, okay, good, pretty, definitely, better, care, love, great −0.13 [−0.22, −0.05] - - - - - - −0.12 [−0.21, −0.03]
disgust (like, um, that, you, her, she, people, mean, kind, they) 0.13 [0.04, 0.21] - - - - - - 0.13 [0.04, 0.22]
surprise (was, actually, been, trying, pretty, really, lot, doing, uh, had) −0.12 [−0.21, −0.04] - - - - - - −0.12 [−0.21, −0.03]
trust (you, relationship, things, terms, change, we, issues, life, sometimes, okay) −0.12 [−0.21, −0.04] - - - - - - −0.13 [−0.22, −0.04]
anticipation (what, hm, about, for, new, income, see, year, program, twenty) −0.24 [−0.32, −0.15] −0.13 [−0.22, −0.05] −0.22 [−0.30, −0.13] - - −0.26 [−0.34, −0.18]
sadness (kind, but, no, depression, feel, guess, died, last, depressed, years) 0.11 [0.02, 0.19] - - - - - - - -
SADNESS lost, sorry, broke, low, sad, alone, lose 0.23 [0.15, 0.31] 0.14 [0.05, 0.22] 0.18 [0.09, 0.26] - - 0.24 [0.15, 0.32]
Loneliness score (i, to, my, like, you, that, me, i’m, but, a) 0.20 [0.12, 0.28] 0.15 [0.06, 0.23] 0.12 [0.03, 0.21] - - 0.19 [0.11, 0.28]
1st PERSON SINGULAR PRONOUNS (I-usage) i, my, i'm, me, i've, myself, i'll, i'd, mine 0.24 [0.15, 0.32] 0.17 [0.08, 0.25] 0.15 [0.07, 0.24] - - 0.23 [0.14, 0.31]
PERSONAL PRONOUNS i, my, you, i'm, me, they, she, he, i've, we 0.25 [0.17, 0.33] 0.18 [0.09, 0.26] 0.15 [0.07, 0.24] - - 0.24 [0.15, 0.32]
PRONOUNS i, that, my, it, you, i'm, it's, me, that's, they 0.22 [0.14, 0.30] 0.15 [0.06, 0.23] 0.15 [0.06, 0.23] - - 0.21 [0.13, 0.30]
ARTICLES a, the, an −0.12 [−0.20, −0.03] - - - - - - −0.11 [−0.20, −0.02]
COMMON VERBS was, know, i'm, it's, have, is, do, been, had, don't 0.11 [0.03, 0.20] - - - - - - 0.13 [0.04, 0.22]
TOTAL FUNCTION WORDS i, and, like, to, a, the, that, of, my, so 0.13 [0.04, 0.21] - - - - - - 0.13 [0.04, 0.22]
PREPOSITIONS like, to, of, in, with, for, on, at, about, out - - - - - - - - 0.13 [0.04, 0.21]
INTERROGATIVES what, when, which, where, how, who, why, whatever, what's, whether - - - - 0.15 [0.06, 0.24] - - 0.13 [0.04, 0.21]
CONJUNCTIONS and, sorry, but, because, or, then, as, when, if, also - - - - 0.12 [0.03, 0.20] - - 0.12 [0.04, 0.21]
NEGATIONS not, no, don't, didn't, can't, nothing, never, haven't, wasn't, couldn't - - - - −0.22 [−0.30, −0.13] −0.18 [−0.26, −0.09] −0.16 [−0.25, −0.07]
INGESTION weight, eat, eating 0.17 [0.08, 0.25] 0.14 [0.05, 0.22] - - - - 0.14 [0.05, 0.23]
BODY sleep, heart, blood, head 0.14 [0.05, 0.22] - - - - - - 0.14 [0.05, 0.22]
CAUSATION because, how, make, since, why, used, change, changes, made, making 0.11 [0.02, 0.20] - - - - - - 0.11 [0.02, 0.20]
WORK work, school, working, worked, education, employment, company, classes, office, learn −0.18 [−0.26, −0.10] - - −0.19 [−0.27, −0.10] - - −0.22 [−0.30, −0.13]
TOTAL WORDS - 0.18 [0.10, 0.27] - - 0.14 [0.05, 0.22] - - 0.18 [0.09, 0.27]
WORD LENGTH - - - −0.12 [−0.20, −0.03] - - - - - -

Note. Columns display relationships of language features with depression severity (“DEP”), anxiety severity (“ANX”), depression severity controlling for anxiety severity (“Specific DEP”), anxiety severity controlling for depression severity (“Specific ANX”), and the first principal component of depression and anxiety severity (“Shared DEP and ANX”). All analyses control for age and sex. Capitalized language features refer to a Linguistic Inquiry and Word Count (LIWC 2015) category; italicized features refer to National Research Council Canada (NRC) weighted lexica; all other features refer to language-based models for the given construct. Words are the 10 words in the given category most frequently used by participants (or, for weighted lexica and language-based models, words with the greatest frequency-by-term-weight product) in the current sample in descending order by (weighted) frequency. The stress score does not have top words as there is no lexicon-based version of this model. All effect sizes displayed meet Benjamini-Hochberg corrected significance levels. For clarity of presentation, language features with no significant effect sizes are omitted.

Depression Features

Depression results appear in the leftmost column (“DEP”) of Table 1. They replicate many features previously associated with depression. More severely depressed individuals used more words from the I-usage category and parent pronoun categories (personal pronouns, total pronouns). More depressed individuals also used more feel, negative emotion, and sadness words, used fewer joy, trust, anticipation, and surprise words, and scored higher on language-based models of depression, stress, and loneliness. Depression was additionally related to causation words and to language reflecting bodily processes, including ingestion, body, and disgust.

Anxiety Features

Anxiety results appear in the middle column (“ANX”) of Table 1. Anxiety results were virtually identical to depression results for pronoun and emotion features, except that anxiety was also related to anxiety and negative valence words, correlated with only one of the two sadness categories, and was unrelated to disgust, surprise, and trust. More severely anxious individuals used fewer words from the negations category (words used to contradict or deny). Contrary to our depression results, anxiety was unrelated to cognitive and biological processes categories.

Common and Specific Language Features of Clinically-Assessed Depression and Anxiety

Planned Analyses

Next, we tested a priori hypotheses regarding language features shared by and unique to depression and anxiety.

Common Features.

We isolated the first principal component of depression and anxiety severity ratings and examined its associations with hypothesized language features. As predicted, the common factor of depression and anxiety shared significant associations (95% CI in brackets) with I-usage (r = .23 [.14, .31]) and negative emotion language (r = .19 [.11, .28]). Also as predicted, individuals with higher scores on the common factor used more language from the sadness and anxiety emotion subcategories and from language-based models of stress and depression (all r > .12, all p < .026). Surprisingly, the common factor was not associated with the language-based anxiety model, nor with any language related to anger. Though counter to our predictions, the distinct pattern for anger provides evidence of discriminability between emotion constructs.

Depression-Specific Features.

We calculated partial correlations to identify features that remained associated with depression severity when controlling for anxiety severity. As predicted, depression was specifically associated with decreased positive emotion words (r = −.09 [−.17, .00]). Contrary to predictions, depression was not specifically associated with reward or leisure language.

Anxiety-Specific Features.

In partial correlations controlling for depression severity, clinician-rated anxiety was not specifically associated with any of our three predicted categories: risk, fear, or physiological sensation language.

Exploratory Analyses

Common Features.

Next, taking an exploratory approach, we examined all possible language features as correlates of the common factor of depression and anxiety. To hold down the false discovery rate, we report effects meeting Benjamini-Hochberg corrected levels of significance. As shown in the rightmost column (“Shared DEP and ANX”) of Table 1, these analyses revealed numerous language features shared by anxiety and depression, many of which were associated with depression and anxiety in our initial analyses. These included features that previous studies have attributed to depression, such as feel and loneliness and several cognitive (causation) and biological (ingest, body) processes categories.

Specific Features.

We used the same exploratory approach to identify language features specific to clinician-rated depression and anxiety severity, respectively (see “Specific DEP” and “Specific ANX” columns in Table 1). To facilitate comparisons, we present radar plots in Figure 1 showing effect sizes for (specific) depression and (specific) anxiety across selected language features.

Figure 1.

Figure 1.

Radar plots of effect sizes (standardized beta coefficients) for selected affective and somatic (top) and style and part-of-speech (bottom) language features, each with depression and specific depression (left) and anxiety and specific anxiety (right). The (−) symbol indicates that the inverse effect size is plotted to facilitate comparison. All analyses control for age and sex.

Depression-Specific Features.

The pronoun effects detected in earlier depression main effect analyses survived when we controlled for anxiety. Unexpectedly, many of the emotion features did not. At Benjamini-Hochberg corrected significance levels, specific depression was no longer associated with (lack of) positive emotion words. However, specific depression was positively associated with sadness and loneliness language, and negatively associated with anxiety, fear, and anticipation language. Additionally, specific depression was associated with ingestion words and with a tendency to use shorter (versus longer) words.

Anxiety-Specific Features.

None of the pronoun effects detected in anxiety main effect analyses survived when we controlled for depression. However, specific anxiety remained associated with negative emotion and with a wide array of emotion terms. The exceptions were sadness and loneliness, which were unique to depression. Specific anxiety was once again associated with decreased negations.

Diagnostic Status Analyses.

We repeated these analyses using depression and anxiety diagnostic status (rather than severity). As expected, fewer of the associations were statistically significant given the lower power afforded by dichotomous than continuous clinical variables. Nevertheless, the pattern of results was essentially unchanged (see Table S2 in the online supplement). Depression status was related to pronoun language, but unrelated to emotion language except sadness and loneliness; when we controlled for anxiety status—which was unrelated to these language features—no significant effects remained. Conversely, anxiety status was unrelated to pronoun language, but was related to negative emotion and other emotion terms; these effects survived when we controlled for depression status.

Common and Specific Language Features of Self-Reported Depression and Anxiety

Next, we sought to replicate our results using the MASQ. This measure provided a conservative test of replication, given its different method of assessment (self-reported rather than clinician-assessed), different approach to separating shared from specific features (psychometric rather than statistical), and content extending beyond major depression and GAD to encompass depression and anxiety more broadly. Results are presented in the online supplement (see Table S3) and summarized below.

General Distress

Several language features were related to General Distress: Mixed Symptoms, the symptoms that are fully shared by depression and anxiety. They included I-usage, feel, sadness, depression, loneliness, and stress. They also included more prepositions and fewer negations, anticipation, and work words.

Common and Specific Depression

The language correlates of mixed symptoms were also correlated with symptoms that are relatively specific (General Distress: Depressive Symptoms) and highly specific (Anhedonic Depression) to depression. Additionally, both depression subscales were related to perceptual processes, (lack of) joy, and more total words. Anhedonic depression was related to I-usage’s parent category, personal pronouns.

Common and Specific Anxiety

General Distress: Anxious Symptoms, which are relatively specific to anxiety, were associated with a small subset of the features associated with depression, including depression, stress, negations, and decreased work language. Unlike depression, Anxious Symptoms were associated with anxiety and were not associated with pronoun categories. Anxious Arousal, the MASQ subscale most specific to anxiety, had no significant language correlates.

Language-Based Models of Depression and Anxiety

Lastly, we built machine learning models of clinician-rated depression and anxiety severity by including all language features as predictors in elastic net models using 10-fold cross validation. In a model controlling for age and sex, language features collectively accounted for 14% of the variance in the common factor of depression and anxiety.

For tests of specificity, we used a two-stage modeling process: We first fit a model in which age, sex, and the clinical covariate (anxiety or depression severity) predicted our clinical outcome (depression or anxiety severity), then built a second model predicting the first model’s residuals using language features alone. Above and beyond demographics, language features accounted for 5% of the variance in depression (controlling for anxiety), and 8% of the variance in anxiety (controlling for depression).

Discussion

Previous work on language features of depression has relied on nonclinical samples and nonspecific psychopathology measures, leaving open the question of whether these features are unique to depression vis-à-vis frequently co-occurring conditions, especially anxiety. In this study, we investigated associations of computationally-derived language features with depression and anxiety in a mixed clinical sample using gold standard measures. We replicated language features previously associated with depression, then showed that many of these features are also associated with anxiety. Next, using a more fine-grained approach, we determined which features are common and specific to depression and anxiety. Lastly, we showed that speech characteristics during a brief interview explained significant variance in depression and anxiety, over and above other clinical and demographic predictors. These results shed new light on the language—and in turn, the inner experience—of depression and anxiety, with implications for theory, research, and practice.

Depression and Anxiety Share Language Features

Several language features were associated with the common factor of clinician-assessed depression and anxiety as well as self-reported symptoms overlapping between depression and anxiety. In particular, increased use of perceptual processes, body, and feel words suggest a focus on one’s internal states. Decreased use of work words may reflect unemployment but may also reflect reduced attentional focus on external places and events (e.g., work, school) relative to internal experiences (e.g., thoughts, feelings). Increased causation language may reflect a maladaptive, abstract-evaluative mode of processing focusing on reasons for feelings and events (Watkins, 2008).

Interestingly, the common factor was associated with alterations in several parts of speech. Increased preposition use may reflect concern with precision (Tausczik & Pennebaker, 2010), whereas increased interrogative use may index searching for information and reflect intolerance of uncertainty (Gentes & Ruscio, 2011). Decreased use of articles, which are typically paired with concrete nouns (real-world objects; Tausczik & Pennebaker, 2010), may reflect increased internal (relative to external) attention. Increased common verb use was driven by verbs commonly paired with I-usage (e.g., have, am, know) and may similarly reflect difficulties with self-distancing (Nook et al., 2022). Lastly, increased conjunction use, and more total words, may reflect a larger number of problems reported in the interview.

Taken together, these findings indicate that many language features previously associated with depression are better understood as the language of distress. This echoes prior observations that depression and anxiety share more similarities than differences, not only in their symptoms but also in their genetic, temperamental, cognitive-behavioral, and environmental risk factors (Ruscio & Khazanov, 2017).

I-Usage is Relatively Specific to Depression

Aligning with previous studies linking I-usage to indicators of general distress, including neuroticism (Tackman et al., 2019), I-usage was related to clinician-rated and self-report measures of shared depression and anxiety. However, I-usage appeared relatively more specific to depression, as it was associated with specific depression but not specific anxiety across clinician- and self-report measures. Taken together, these findings position I-usage in a transitional space between shared and specific depression, in line with the notion that general distress variance occurs along a dimension (Watson, 2009).

I-usage likely reflects self-focused attention (SFA), a focus on information generated internally (from the self) rather than externally (from the environment; Ingram, 1990). SFA involves preoccupation with one’s negative thoughts and feelings and serves to maintain/amplify negative affect (Mor & Winquist, 2002). When expressed in language, this type of (over)focus on the self puts depressed individuals at risk of rejection, as they tend to disclose unsolicited negative content, often in violation of social norms, and seek excessive reassurance that frustrates and annoys others (Hames et al., 2013). Theorists have disagreed whether SFA reflects general distress or depression in particular. Our findings are in line with evidence that SFA is associated with both depression and anxiety, but shares a significantly stronger relationship with depression (Mor & Winquist, 2002).

Relatedly, I-usage is a hallmark of the self-immersed perspective, the experience of focusing narrowly on concrete details of one’s present experience rather than on the broader perspective afforded by self-distancing (Kross & Ayduk, 2011). Degree of distancing has been indexed in language by combining proportion of non-first-person singular pronouns (i.e., lack of I-usage) with proportion of non-present tense verbs (Nook et al., 2022). Observational, experimental, and longitudinal research has demonstrated a bidirectional relationship between linguistic self-distancing and decreased depression (e.g., Nook et al., 2022; Shahane & Denny, 2019). Taken together with common verbs being significantly associated only with depression severity, our findings suggest that the self-immersed perspective may be relatively specific to depression vis-à-vis anxiety.

Negative Emotion Language is Relatively Specific to Anxiety

Several emotion language features were associated with the common factor of depression and anxiety. However, tests for specificity revealed these features to be more robustly associated with anxiety. Specific anxiety was related to negative emotion, its subcategory anxiety, and negative valence, as well as language-based models of anxiety, depression, and stress. By contrast, specific depression was not related to the broad negative emotion feature but did share associations with sadness, (lack of) positive emotion (which did not survive correction for multiple comparisons), and the language-based loneliness model (which is heavily weighted by I-usage terms). Our results shed light on previous mixed results for negative emotion language in depression (Ireland & Mehl, 2014), hinting that these results may have been influenced by the level of co-occurring anxiety.

Although it is perhaps unsurprising that sadness language is specific to depression, our results expand upon prior research (Sonnenschein et al., 2018) by showing that depression is associated narrowly with sadness and lack of positive emotion language, whereas anxiety is associated with a broader array of negative emotion words. In fact, even the language-based depression model was relatively more specific to anxiety than depression. Language-based models are typically built using less than gold standard measures of depression due to the massive samples required for model development. However, our results demonstrate that working to obtain higher levels of specificity to the construct of interest would improve these models before they are rolled out at scale. To improve specificity, future work should include measures of both depression and anxiety and recruit samples with a wide range of scores on both constructs, such as mixed clinical samples or risk-enriched samples (e.g., persons with a family history of emotional disorder). Our two-stage modeling process provides an initial blueprint for what model development prioritizing specificity could look like. That said, it is encouraging that the language-based anxiety model showed good specificity to anxiety, so much so that it was unrelated to any measure of depression or the common factor. With further development, these models could be used in systems aimed at early disease detection at the individual and population level.

Novel Language Features of Depression and Anxiety

Our exploratory analyses uncovered several language features not previously described in the literature. Specific depression was related to using shorter words, which may be a marker of fatigue or psychomotor retardation. It was also related to ingestion language, which was dominated by references to weight and eating in our sample. This language aligns with appetite and weight disturbances in MDD, as well as with gastrointestinal disturbances recognized in syndromal depression measures (Hamilton, 1960). Beyond these somatic explanations, weight and eating concerns may reflect body dissatisfaction or low self-image.

Specific anxiety was related to lack of negations. While on its face this seems contrary to anxiety’s relationship with negative emotion language, in the context of an interview about life difficulties, negation words (e.g., not, no, don’t) likely indicate denying problems. Thus, reduced negations likely reflects more problems shared during the interview. Consistent with this interpretation, individuals with anxiety in our sample (operationalized as those diagnosed with GAD) had a larger number of comorbid diagnoses than individuals with depression alone. Taken together with the I-usage findings, this suggests that when describing recent difficulties, depression is expressed through talking about one’s internally-generated experiences, while anxiety is expressed through talking about one’s problems. Relationships of conjunctions (which may also index talking about problems) and interrogatives (perhaps reflecting searching for answers) with anxiety (before controlling for depression) provide further evidence that anxiety is expressed via discussing problems.

Language-Based Models of Depression and Anxiety

Lastly, our demonstration that language features extracted from a brief, open-ended interview capture variance in depression and anxiety raises implications for clinical practice. Clinics routinely conduct intake interviews assessing patients’ current concerns; language-based models unobtrusively “layered onto” these intakes could do double duty by yielding quantitative estimates of psychopathology severity. This approach would complement existing assessment methods by circumventing idiosyncratic rating patterns (e.g., over- or under-reporting, yea- or nay-saying, low insight; Hunt et al., 2003) that bias self-report responses to overt questioning about sensitive symptoms. This approach might even reduce the time between when a patient seeks treatment and when the first intervention is delivered: Assessment could begin at the first phone call, rather than the first face-to-face meeting. To realize this promise, our language-based models, which explained 14% of the common variance and 5-8% of specific variance in depression and anxiety, will need to be strengthened using larger samples and longer interviews, which increase statistical power to detect the often-subtle effects that comprise these models.

Strengths and Limitations

Our study had several strengths. We used a mixed clinical sample, which is known to increase statistical power by capturing the widest possible range of severity scores (Stade & Ruscio, 2022). We also performed rigorous tests for specificity using high-quality measures of depression and anxiety. Nevertheless, our specificity tests did not account for all possible co-occurring conditions. Future work could include additional common comorbidities as covariates. Large, well-characterized samples will be required to achieve adequate statistical power for these tests. Larger samples would also allow language correlates to be examined for less reliably measured constructs such as individual symptoms of psychopathology, an important endeavor given the heterogeneity of many clinical conditions, particularly depression. Practice research networks (Parry et al., 2010) could allow such samples to be amassed by aggregating large numbers of recorded intake interviews across clinics.

Another strength of this study was the context in which we captured language. By prompting participants to describe problems to a clinician in a private setting, we likely increased our ability to detect language markers reflecting participants’ private experience. However, it is unclear to what extent this context drove our results. While early evidence indicates that language format (written versus spoken) and setting (private versus public) do not always moderate effects (Edwards & Holtzman, 2017), more work is needed to examine the robustness of effects across contexts. In particular, as prompting participants to focus on life difficulties may have influenced some language correlates observed here (e.g., negations), there is a need to replicate these results using open-ended prompts or language captured in daily life.

Finally, our clinician-rated measure of anxiety focused on GAD. In some respects, this was a strength of the study: Given its high comorbidity, symptom overlap, and etiological relationship with depression (Watson, 2009), GAD provided an especially conservative test of language features discriminating depression from anxiety. The generalizability of these results is bolstered by their replication using a high-quality self-report measure assessing transdiagnostic anxiety symptoms rather than a single disorder. However, a different measure of anxiety, especially one focusing on fear rather than distress (Watson, 2009), may have yielded different results. Had we assessed an anxiety disorder with prominent fear symptoms, we might have found the expected associations with fear words and with the physiological sensation lexicon, which could reflect sympathetic arousal. However, it is notable that we observed no language correlates for the MASQ Anxious Arousal subscale, which maps more closely to fear than distress. Future work on the language of anxiety should include fear-based anxiety disorders to provide broader coverage of the anxiety disorder spectrum.

Conclusion

Recent years have witnessed a surge of interest in using computational linguistic methods to understand and predict psychological phenomena. In psychopathology, these efforts have concentrated on depression, typically using social media posts (Guntuku et al., 2017). The current study found that language features previously associated with depression replicate when interview responses and gold standard depression measures are used. However, through careful controls for specificity, we showed that many of these language features are shared with anxiety. Our findings introduce caution into efforts to develop language-based models of psychological constructs, suggesting that researchers may need to consider and control for boundary constructs to develop models with good discriminant validity. As more accurate, specific models are developed, the unobtrusive assessment of psychopathology from language could improve clinical care. Such models, and the rich natural language on which they are based, have the potential to yield unique insights into the lived experience of psychopathology.

Supplementary Material

Supplemental Material 1
Supplemental Material 2
Supplemental Material 3
Supplemental Material 4
Supplemental Material 5
Supplemental Material 6

Acknowledgments

This work was supported by the National Institutes of Mental Health [grant R01-MH094425 to A.M.R.] and the National Science Foundation [grant DGE-1845298 to E.C.S.]. The opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

Footnotes

The authors have no conflicts of interest to report.

This study was approved by the University of Pennsylvania Institutional Review Board (Protocol 815763).

Portions of this research were presented at the annual meeting of the Society for Research in Psychopathology, Philadelphia, PA, September, 2022.

Preregistration and analysis code are available on the Open Science Framework (https://osf.io/95nhj/).

1

The negative emotion category contains several subcategories of emotion, including sadness and anxiety.

2

We repeated our core analyses using this reduced word threshold and observed very similar results (see online supplement).

3

We repeated our core analyses without age/sex controls and observed very similar results (see online supplement).

References

  1. Barlow DH, Sauer-Zavala S, Carl JR, Bullis JR, & Ellard KK (2014). The nature, diagnosis, and treatment of neuroticism: Back to the future. Clinical Psychological Science, 2(3), 344–365. 10.1177/2167702613505532 [DOI] [Google Scholar]
  2. Benjamini Y, & Hochberg Y (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300. 10.1111/j.2517-6161.1995.tb02031.x [DOI] [Google Scholar]
  3. Brockmeyer T, Zimmermann J, Kulessa D, Hautzinger M, Bents H, Friederich H-C, Herzog W, & Backenstrass M (2015). Me, myself, and I: Self-referent word use as an indicator of self-focused attention in relation to depression and anxiety. Frontiers in Psychology, 6. 10.3389/fpsyg.2015.01564 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Brown TA, & Barlow DH (2014). Anxiety and Related Disorders Interview Schedule for DSM-5 (ADIS-5L).: Lifetime Version. Client Interview Schedule. Oxford University Press. [Google Scholar]
  5. Champely S. (2020). pwr: Basic functions for power analysis. R package version 1.3-0. https://CRAN.R-project.org/package=pwr [Google Scholar]
  6. Clark LA, & Watson D (1991). Tripartite model of anxiety and depression: Psychometric evidence and taxonomic implications. Journal of Abnormal Psychology, 100(3), 316–336. [DOI] [PubMed] [Google Scholar]
  7. Coyne JC (1994). Self-reported distress: Analog or ersatz depression? Psychological Bulletin, 116(1), 29–45. [DOI] [PubMed] [Google Scholar]
  8. Craske MG, Rauch SL, Ursano R, Prenoveau J, Pine DS, & Zinbarg RE (2009). What is an anxiety disorder? Depression and Anxiety, 26(12), 1066–1085. 10.1002/da.20633 [DOI] [PubMed] [Google Scholar]
  9. Dirkse D, Hadjistavropoulos HD, Hesser H, & Barak A (2015). Linguistic analysis of communication in therapist-assisted internet-delivered cognitive behavior therapy for generalized anxiety disorder. Cognitive Behaviour Therapy, 44(1), 21–32. 10.1080/16506073.2014.952773 [DOI] [PubMed] [Google Scholar]
  10. Edwards T, & Holtzman NS (2017). A meta-analysis of correlations between depression and first person singular pronoun use. Journal of Research in Personality, 68, 63–68. 10.1016/j.jrp.2017.02.005 [DOI] [Google Scholar]
  11. Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preoţuc-Pietro D, Asch DA, & Schwartz HA (2018). Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences, 115(44), 11203–11208. 10.1073/pnas.1802331115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gentes EL, & Ruscio AM (2011). A meta-analysis of the relation of intolerance of uncertainty to symptoms of generalized anxiety disorder, major depressive disorder, and obsessive-compulsive disorder. Clinical Psychology Review, 31(6), 923–933. 10.1016/j.cpr.2011.05.001 [DOI] [PubMed] [Google Scholar]
  13. Geronimi EMC, & Woodruff-Borden J (2015). The language of worry: Examining linguistic elements of worry models. Cognition and Emotion, 29(2), 311–318. 10.1080/02699931.2014.917071 [DOI] [PubMed] [Google Scholar]
  14. Glenn ML, Strassel SM, & Lee H (2009). XTrans: A speech annotation and transcription tool. Interspeech 2009, 2855–2858. 10.21437/Interspeech.2009-729 [DOI] [Google Scholar]
  15. Guntuku SC, Buffone A, Jaidka K, Eichstaedt J, & Ungar L (2019a). Understanding and measuring psychological stress using social media. Proceedings of the Thirteenth International AAAI Conference on Web and Social Media (ICWSM 2019). http://arxiv.org/abs/1811.07430 [Google Scholar]
  16. Guntuku SC, Schneider R, Pelullo A, Young J, Wong V, Ungar L, Polsky D, Volpp KG, & Merchant R (2019b). Studying expressions of loneliness in individuals using twitter: An observational study. BMJ Open, 9(11), 1–8. 10.1136/bmjopen-2019-030355 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Guntuku SC, Yaden DB, Kern ML, Ungar LH, & Eichstaedt JC (2017). Detecting depression and mental illness on social media: An integrative review. Current Opinion in Behavioral Sciences, 18, 43–49. 10.1016/j.cobeha.2017.07.005 [DOI] [Google Scholar]
  18. Hames JL, Hagan CR, & Joiner TE (2013). Interpersonal processes in depression. Annual Review of Clinical Psychology, 9(1), 355–377. 10.1146/annurev-clinpsy-050212-185553 [DOI] [PubMed] [Google Scholar]
  19. Hamilton M (1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23(1), 56–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hubley AM, & Zumbo BD (1996). A dialectic on validity: Where we have been and where we are going. The Journal of General Psychology, 123(3), 207–215. 10.1080/00221309.1996.9921273 [DOI] [Google Scholar]
  21. Hunt M, Auriemma J, & Cashaw ACA (2003). Self-report bias and underreporting of depression on the BDI-II. Journal of Personality Assessment, 80(1), 26–30. 10.1207/S15327752JPA8001_10 [DOI] [PubMed] [Google Scholar]
  22. Ingram RE (1990). Self-focused attention in clinical disorders: Review and a conceptual model. Psychological Bulletin, 107(2), 156–176. [DOI] [PubMed] [Google Scholar]
  23. Ireland ME, & Mehl MR (2014). Natural language use as a marker of personality. In Holtgraves TM (Ed.), The Oxford Handbook of Language and Social Psychology. Oxford University Press. 10.1093/oxfordhb/9780199838639.013.034 [DOI] [Google Scholar]
  24. Kazak AE (2018). Editorial: Journal article reporting standards. American Psychologist, 73(1), 1–2. 10.1037/amp0000263 [DOI] [PubMed] [Google Scholar]
  25. Kendall PC, Hollon SD, Beck AT, Hammen CL, & Ingram RE (1987). Issues and recommendations regarding use of the Beck Depression Inventory. Cognitive Therapy and Research, 11(3), 289–299. 10.1007/BF01186280 [DOI] [Google Scholar]
  26. Kern ML, Park G, Eichstaedt JC, Schwartz HA, Sap M, Smith LK, & Ungar LH (2016). Gaining insights from social media language: Methodologies and challenges. Psychological Methods, 21(4), 507–525. 10.1037/met0000091 [DOI] [PubMed] [Google Scholar]
  27. Kross E, & Ayduk O (2011). Making meaning out of negative experiences by self-distancing. Current Directions in Psychological Science, 20(3), 187–191. 10.1177/0963721411408883 [DOI] [Google Scholar]
  28. Mineka S, Watson D, & Clark LA (1998). Comorbidity of anxiety and unipolar mood disorders. Annual Review of Psychology, 49(1), 377–412. 10.1146/annurev.psych.49.1.377 [DOI] [PubMed] [Google Scholar]
  29. Mohammad S. (2018). Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 174–184. 10.18653/v1/P18-1017 [DOI] [Google Scholar]
  30. Mohammad SM (2018). Word affect intensities. Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018). https://saifmohammad.com/WebDocs/lrec2018-paper-word-emotion.pdf [Google Scholar]
  31. Mor N, & Winquist J (2002). Self-focused attention and negative affect: A meta-analysis. Psychological Bulletin, 128(4), 638–662. 10.1037/0033-2909.128.4.638 [DOI] [PubMed] [Google Scholar]
  32. Nakazawa M. (2022). fmsb: Functions for medical statistics book with some demographic data. R package version 0.7.3. https://CRAN.R-project.org/package=fmsb [Google Scholar]
  33. Nook EC, Hull TD, Nock MK, & Somerville LH (2022). Linguistic measures of psychological distance track symptom levels and treatment outcomes in a large set of psychotherapy transcripts. Proceedings of the National Academy of Sciences, 119(13), e2114737119. 10.1073/pnas.2114737119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Pennebaker JW, Boyd RL, Jordan K, & Blackburn K (2015). The development and psychometric properties of LIWC2015. Austin, TX: University of Texas at Austin. [Google Scholar]
  35. R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/ [Google Scholar]
  36. Rouhizadeh M, Jaidka K, Smith L, Schwartz HA, Buffone A, & Ungar L (2018). Identifying locus of control in social media language. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 1146–1152. 10.18653/v1/D18-1145 [DOI] [Google Scholar]
  37. Ruscio AM, & Khazanov GK (2017). Anxiety and depression. In DeRubeis RJ & Strunk DR (Eds.), Oxford handbook of mood disorders (pp. 313–324). Oxford University Press. [Google Scholar]
  38. Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, Shah A, Kosinski M, Stillwell D, Seligman MEP, & Ungar LH (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PLoS ONE, 8(9), 1–16. 10.1371/journal.pone.0073791 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Schwartz HA, Eichstaedt J, Kern ML, Park G, Sap M, Stillwell D, Kosinski M, & Ungar L (2014). Towards assessing changes in degree of depression through Facebook. Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 118–125. 10.3115/v1/W14-3214 [DOI] [Google Scholar]
  40. Schwartz HA, Giorgi S, Sap M, Crutchley P, Ungar L, & Eichstaedt J (2017). DLATK: Differential Language Analysis ToolKit. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 55–60. 10.18653/v1/D17-2010 [DOI] [Google Scholar]
  41. Shaffer VN, Kim D, & Yoon KL (2021). Physiological sensation word usage in social anxiety disorder with and without comorbid depression. Journal of Behavior Therapy and Experimental Psychiatry, 71, 1–5. 10.1016/j.jbtep.2021.101638 [DOI] [PubMed] [Google Scholar]
  42. Shahane AD, & Denny BT (2019). Predicting emotional health indicators from linguistic evidence of psychological distancing. Stress and Health, 35(2), 200–210. 10.1002/smi.2855 [DOI] [PubMed] [Google Scholar]
  43. Sonnenschein AR, Hofmann SG, Ziegelmayer T, & Lutz W (2018). Linguistic analysis of patients with mood and anxiety disorders during cognitive behavioral therapy. Cognitive Behaviour Therapy, 47(4), 315–327. 10.1080/16506073.2017.1419505 [DOI] [PubMed] [Google Scholar]
  44. Stade EC, & Ruscio AM (2022). A meta-analysis of the relationship between worry and rumination. Clinical Psychological Science. Advance online publication. 10.1177/21677026221131309 [DOI] [Google Scholar]
  45. Tackman AM, Sbarra DA, Carey AL, Donnellan MB, Horn AB, Holtzman NS, Edwards TS, Pennebaker JW, & Mehl MR (2019). Depression, negative emotionality, and self-referential language: A multi-lab, multi-measure, and multi-language-task research synthesis. Journal of Personality and Social Psychology, 116(5), 817–834. 10.1037/pspp0000187 [DOI] [PubMed] [Google Scholar]
  46. Tausczik YR, & Pennebaker JW (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. 10.1177/0261927X09351676 [DOI] [Google Scholar]
  47. Watkins ER (2008). Constructive and unconstructive repetitive thought. Psychological Bulletin, 134(2), 163–206. 10.1037/0033-2909.134.2.163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Watson D. (2009). Differentiating the mood and anxiety disorders: A quadripartite model. Annual Review of Clinical Psychology, 5(1), 221–247. 10.1146/annurev.clinpsy.032408.153510 [DOI] [PubMed] [Google Scholar]
  49. Watson D, Clark LA, Weber K, & Assenheimer JS, Strauss ME, & McCormick RA (1995). Testing a tripartite model: II. Exploring the symptom structure of anxiety and depression in student, adult, and patient samples. Journal of Abnormal Psychology, 104(1), 15–25. 10.1037/0021-843X.104.1.15 [DOI] [PubMed] [Google Scholar]
  50. Watson D, Weber K, Assenheimer JS, Clark LA, Strauss ME, & McCormick RA (1995). Testing a tripartite model: I. Evaluating the convergent and discriminant validity of anxiety and depression symptom scales. Journal of Abnormal Psychology, 104(1), 3–14. 10.1037/0021-843X.104.1.3 [DOI] [PubMed] [Google Scholar]
  51. Zou H, & Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. 10.1111/j.1467-9868.2005.00503.x [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material 1
Supplemental Material 2
Supplemental Material 3
Supplemental Material 4
Supplemental Material 5
Supplemental Material 6

RESOURCES