Construct validity for computational linguistic metrics in individuals at clinical risk for psychosis: associations with clinical ratings

Zarina R Bilgrami; Cansu Sarac; Agrima Srivastava; Shaynna N Herrera; Matilda Azis; Shalaila S Haas; Riaz B Shaik; Muhammad A Parvaz; Vijay A Mittal; Guillermo Cecchi; Cheryl M Corcoran

doi:10.1016/j.schres.2022.01.019

. Author manuscript; available in PMC: 2023 Jul 1.

Published in final edited form as: Schizophr Res. 2022 Jan 29;245:90–96. doi: 10.1016/j.schres.2022.01.019

Construct validity for computational linguistic metrics in individuals at clinical risk for psychosis: associations with clinical ratings

Zarina R Bilgrami ^1,^2,^*, Cansu Sarac ^1,^*, Agrima Srivastava ¹, Shaynna N Herrera ¹, Matilda Azis ^2,³, Shalaila S Haas ¹, Riaz B Shaik ¹, Muhammad A Parvaz ¹, Vijay A Mittal ⁴, Guillermo Cecchi ^5,⁺, Cheryl M Corcoran ^1,^6,⁺

PMCID: PMC10062407 NIHMSID: NIHMS1775850 PMID: 35094918

Abstract

Language deficits are prevalent in psychotic illness, including its risk states, and are related to marked impairment in functioning. It is therefore important to characterize language impairment in the psychosis spectrum in order to develop potential preventive interventions. Natural language processing (NLP) metrics of semantic coherence and syntactic complexity have been used to discriminate schizophrenia patients from healthy controls (HC) and predict psychosis onset in individuals at clinical high-risk (CHR) for psychosis. To date, no studies have yet examined the construct validity of key NLP features with respect to clinical ratings of thought disorder in a CHR cohort. Herein we test the association of key NLP metrics of coherence and complexity with ratings of positive and negative thought disorder, respectively, in 60 CHR individuals, using Andreasen’s Scale of Assessment of Thought, Language and Communication (TLC) Scale to measure of positive and negative thought disorder. As hypothesized, in CHR individuals, the NLP metric of semantic coherence was significantly correlated with positive thought disorder severity and the NLP metrics of complexity (sentence length and determiner use) were correlated with negative thought disorder severity. The finding of construct validity supports the premise that NLP analytics, at least in respect to core features of reduction of coherence and complexity, are capturing clinically relevant language disturbances in risk states for psychosis. Further psychometric study is required, in respect to reliability and other forms of validity.

1. Introduction

The processing of language, including its production, is abnormal in schizophrenia and related psychotic disorders (Brown and Kuperberg, 2015). Abnormal language production reflects disturbances in thought, a prevalent feature of psychotic disorders related to marked impairment in social and role function (Roche et al., 2015). Language disturbance occurs early in the course of psychotic disorders, even prior to psychosis onset (Gooding et al., 2013) (Bearden et al., 2011) and can persist stably for years (Marengo and Harrow, 1997). As yet, there is no evidence-based treatment for abnormal language production in psychotic disorders beyond antipsychotics, and their efficacy is limited. It is important therefore to fully characterize language impairment in schizophrenia and its risk states, in order to develop potential therapeutics, including preventive interventions.

A common approach for the evaluation of language production in schizophrenia and other psychotic disorders is the Scale for the Assessment of Thought, Language and Communication (TLC), developed by Andreasen (Andreasen, 1986). In this scale, clinical ratings are typically applied to natural language production solicited in the context of interview, in which patients are asked to speak without interruption for ten minutes, and then asked questions both personal and abstract over the course of about forty minutes. The TLC has a heuristic of two main domains of thought disorder. One domain is “positive thought disorder”, which describes the loss of flow of meaning in speech, captured by TLC items such as tangentiality, derailment and circumstantiality. The second domain is “negative thought disorder”, captured by TLC items such as poverty of speech, and poverty of content. While TLC positive thought disorder is characteristic of both schizophrenia and affective disorders with psychosis, TLC negative thought disorder is more specific to schizophrenia, as found by Andreasen in her initial studies (Andreasen, 1979b) and later in meta-analysis (Yalincetin et al., 2017). Of note, TLC positive and negative thought disorder were evident as early as age nine in children at familial risk who developed schizophrenia a decade later, with a classification accuracy of 94% (Gooding et al., 2012).

These two constructs of TLC positive and negative thought disorder in natural language can also be assessed using automated natural language processing (NLP) analytics. Specifically TLC positive thought disorder (e.g. tangentiality and derailment) has been conceptualized as a decrease in semantic or discourse coherence, which can be assessed using NLP analytics such as latent semantic analysis (LSA) (Landauer and Dumais, 1997) (Landauer et al., 1998), and other related corpus-based linguistic approaches that use word embeddings (Word2Vec, GloVE, and BERT) (Corcoran et al., 2020). Elvevåg and colleagues used LSA to discriminate language in schizophrenia, finding decreases in LSA semantic coherence related to TLC ratings of thought disorder and functional impairment (Elvevag et al., 2007) (Elvevåg et al., 2010). Decreases in LSA semantic coherence have also been found to be predictive of psychosis onset among young people at clinical high risk (Bedi et al., 2015) (Corcoran et al., 2018). Herein, we use BERT to assess semantic coherence, given advantages over LSA.

TLC negative thought disorder, characterized by poverty of speech and its content, has been conceptualized as a decrease in syntactic complexity, which can be assessed using NLP analytics such as speech graphs (Mota et al., 2012), semantic density analyses (Rezaii et al., 2019), and metrics from part-of-speech (POS) tagging (Santorini, 1990) (Bedi et al., 2015) (Corcoran et al., 2018)). Speech graph analysis shows that reduced size of strongly connected subgraphs is predictive of psychosis onset (Spencer et al., 2020), specifically schizophrenia diagnosis (Mota et al., 2017), and is related to negative symptoms (Mota et al., 2017). Similarly, POS tagging analytics show that two metrics of reduced syntactic complexity - shorter sentence length and decreased use of words that introduce dependent clauses (e.g. complementizer or determiner pronouns such as “that” and “which”) – were predictive of psychosis onset (primarily schizophrenia) and related to negative symptoms (Bedi et al., 2015) (Corcoran et al., 2018). Herein, we focus on POS tagging of sentence length and determiner pronouns as metrics of complexity.

While these NLP analytic approaches of coherence and complexity would seem to map onto Andreasen’s heuristic of positive and negative thought disorder, only empirical studies of association of these NLP metrics with the “ground truth” of TLC ratings can support the claim of construct validity. More than a decade ago, Elvevåg stratified a cohort of schizophrenia patients by global TLC ratings, finding decreased LSA coherence in the “high thought disorder” subgroup (Elvevag et al., 2007). More recently, in a dataset of YouTube videos of interviews with twenty schizophrenia patients, decrease in LSA coherence was associated with TLC circumstantiality, and total words with TLC poverty of speech, suggesting that NLP metrics have separate hypothesized associations with TLC positive and negative TD items among individuals with psychotic disorder (Krell et al., 2021).

In the current study, we solicited speech through qualitative interview from a cohort of 60 individuals at clinical high risk (CHR) for psychosis, and 27 healthy controls, obtaining both TLC clinical ratings and NLP metrics of coherence and complexity in order to examine construct validity of NLP. We hypothesized that among CHR individuals, TLC measures of tangentiality, circumstantiality and derailment (e.g., TLC positive thought disorder items) would be associated with NLP measures of semantic coherence. Likewise, we hypothesized that TLC measures of poverty of speech, and its content (e.g. TLC negative thought disorder items) would be associated with POS syntactic complexity metrics of sentence length and determiner pronoun usage, which we previously found to be associated with negative symptom severity (Bedi et al., 2015). Other exploratory hypotheses were conducted, with correction for multiple comparisons,

2. Methods

2.1. Participants

Participants included 60 individuals at clinical high risk (CHR) for psychosis, and 27 healthy controls (HC) ascertained in the New York City metropolitan area between 2016 and 2020. Recruitment occurred through online advertising and flyers as well as referrals from outpatient clinics, schools, and the community. All participants were administered the Structured Interview for Psychosis-Risk Syndromes/Scale for Psychosis-Risk Syndromes (SIPS/SOPS; Woods et al., 2019) to assess eligibility and symptom severity. All CHR participants met criteria for Attenuated Psychosis Syndrome (APS) and were help-seeking. An additional inclusion criterion was fluency in English. Exclusion criteria included history of threshold psychosis (determined by the Presence of Psychosis (POPS) criteria on the SIPS/SOPS), any major neurological or medical disorder, and IQ less than 70. All participants were administered the Structured Clinical Interview for DSM-5 (SCID-5; First et al., 2015). Exclusion criteria for HCs included any SCID-5 diagnoses except specific phobia. Written informed consent was obtained from subjects older than 18 while assent and parental consent were obtained from minors and their legal guardians respectively. This study was approved by the Institutional Review Boards at the New York State Psychiatric Institute and the Icahn School of Medicine at Mount Sinai.

2.2. Speech recording and processing

Speech was elicited in open-ended qualitative interviews. Interviewing techniques followed a phenomenological framework, in which participants were asked “How have things been going for you lately?” as the first question; they were then free to discuss topics of their choosing, with no other predetermined questions from the interviewer (Davidson, 1994) (Ben-David et al., 2014). Interviews lasted 30-40 minutes and were audio recorded, transcribed by a transcription company (https://www.transcribeme.com/). TranscribeMe! is a HIPAA-compliant service with multiple safeguards designed to protect the privacy and security of personal health information. Participants were asked not to give their own names or names of others during the interview and transcripts were manually de-identified by study staff prior to analysis. All participants were made aware of the process before providing informed consent. Sentence boundaries were determined by the transcription service, TranscribeMe!.

2.3. Thought, Language and Communication (TLC) Scale: Manual ratings

Clinical ratings were calculated using the TLC (Andreasen, 1986), which has a heuristic of positive and negative thought disorder evident in spoken language. Positive thought disorder items include tangentiality, circumstantiality and derailment while negative items include poverty of speech and poverty of content of speech. The TLC has 18 items, of which nine are scored on a scale of 0-4 (absent, mild, moderate, severe and extreme) and nine are scored on a scale of 0-3 (absent to severe, excluding “extreme”).

For this study, training in the TLC was provided by Drs. Vijay Mittal and Matilda Azis (co-authors). Consensus ratings were reached between raters and trainers for three (one HC and 2 CHR) transcripts before analysis began. Each transcript was then reviewed by a trained member of the research team (CS, ZRB), and ratings for each item were generated. Complete blinding of raters was not possible, given topics discussed. As scores for TLC items are based on 50-minute interviews, scores were prorated and adjusted accordingly for the shorter interview durations in this study. Of note, self-reference was not calculated as the interview style used was inherently centered around subjects and they were expected to speak about themselves. Also, as the cohort consisted of both healthy individuals and those with attenuated psychotic symptoms, a number of TLC items characteristic of more chronic psychotic illness were not observed (e.g., stilted speech, neologisms and clanging and phonemic and semantic paraphasia) or typically marked as absent or mild (distractibility, word approximation, incoherence, echolalia and blocking); these items were therefore excluded from further analysis (see Table 2).

Table 2.

Automated language features and clinical ratings

LANGUAGE FEATURES	Clinical High-Risk	Healthy Controls

Automated NLP features

Min Semantic Coherence	0.28 [0.07]	0.27 [0.06]
Max Sentence Length	64.0 [18.3]	69 [16.9]
Determiner pronouns	0.14 [0.02]	0.13 [0.02]

Clinical TLC items	Mean (SD)

*Positive Thought Disorder*

Tangentiality (Scale of 1-4)	2.4 [1.5] ^*	1.1 [1.0]
Derailment (Scale of 1-4)	1.2 [1.4] ^*	0.4 [0.7]
Circumstantial (Scale of 1-3)	1.1 [1.0]	1.0 [1.1]
*Negative Thought Disorder*

Poverty of Speech (Scale of 1-4)	0.8 [0.9]	0.8[1.0]
Poverty of Content^* (Scale of 1-4)	1.6 [1.0]	0.9 [0.7]
*Other*

Pressure (Scale of 1-4)	0.7 [1.2] ^*	0.3 [0.7]
Incoherence (Scale of 1-4)	0.3 [0.7]	0.0 [0.0]
Illogicality^* (Scale of 1-4)	0.9 [1.1]	0.04 [0.2]
Word Approximation (Scale of 1-3)	0.5 [0.7]	0.04 [0.2]
Distractible (Scale of 1-4)	0.3 [0.6]	0.1 [0.4]
Loss of Goal (Scale of 1-3)	0.8 [1.1]	0.3 [0.6]
Perseveration^* (Scale of 1-3)	0.6 [1.1]	0.0 [0.0]
Echolalia (Scale of 1-3)	0.2 [0.7]	0.1 [0.3]
Blocking (Scale of 1-3)	0.2 [0.6]	0.2 [0.4]
Clanging (Scale of 1-4)	0 [0]	0 [0]
Neologisms (Scale of 1-3)	0 [0]	0 [0]
Stilted Speech (Scale of 1-3)	0 [0]	0 [0]

Open in a new tab

(p < .05)

2.4. Natural language processing (NLP)

Automated linguistic variables were calculated using natural language processing analytics. We used the Natural Language Toolkit, an open-source library available on the Internet (www.nltk.com). The transcripts were first preprocessed and cleaned by removing punctuation (commas, periods etc.) and converting the sentences to lower case. Stop words and filler words were accounted for using NLTK and dysfluencies remained intact. Preprocessing was followed by tokenization and part of speech (POS) tagging. Part of speech tagging (POS-Tag) is based upon the Penn Tree Bank (Santorini, 1990) in NLTK. The frequency of usage of each type of “part of speech” (POS) was estimated, so as to help delineate sentences and to characterize them in respect to complexity (e.g., the use of complementizer or determiner pronouns such as “which” and “that”, which introduce dependent clauses). The ‘sent_tokenize’ package from NLTK was used to separate sentences based on boundaries in transcripts.

Semantic coherence is defined as the flow in the meaning of sentences. Sentence-level semantic coherence in speech was estimated using a Bidirectional Encoder Representation from Transformer (BERT) (Devlin et al., 2019) to obtain sentence embeddings. The BERT model is similar to latent semantic analysis (LSA) but generates embeddings that are “context aware”. In this bidirectional model the information for the context of each word comes from words that both follow and precede it. As in LSA, we vectorized the consecutive sentences using BERT and then calculated the cosine (similarity) between successive sentence vectors. We focused on minimal semantic coherence, as well as maximum sentence length and decreased use of determiner pronouns, as these have had relevance as predictors of psychosis onset among CHR individuals (Bedi et al., 2015).

2.5. Data Analysis

Shapiro-Wilkes tests were done for each TLC rating and NLP metric to assess whether data were normally distributed. Group differences in TLC ratings and NLP metrics were assessed with Mann-Whitney U tests. Spearman correlations were used to test hypothesized associations of NLP coherence with TLC positive thought disorder items (tangentiality, derailment and circumstantiality), and of NLP metrics of syntactic complexity (maximum sentence length and use of determiner pronouns) with TLC negative thought disorder items (poverty of speech and its content). Alpha was set at .05 for the seven hypothesized associations. An additional twenty post-hoc tests of correlations among NLP metrics and TLC ratings were done, and also with SIPS/SOPS measures of language disturbances, with correction for multiple comparisons.

3. Results

3.1. Descriptive statistics, including group differences

There were 60 CHR individuals and 27 healthy volunteers, who were similar in age (early twenties on average) and sex distribution (~50% male); the CHR individuals were characteristic of other CHR cohorts in respect to symptom severity and medication exposure (Table 1). CHR individuals had higher TLC ratings than healthy individuals for many TLC items of interest, including tangentiality and derailment, though did not differ in circumstantiality, poverty of content or poverty of speech, nor in NLP metrics of coherence and complexity (Table 2). Analyses of group differences accounted for non-normal distribution of ratings, and were adjusted for age, race, and sex.

Table 1.

Demographics and symptom ratings

DEMOGRAPHICS	Clinical High-Risk	Healthy Controls

Age (Mean [SD])	23 [5]	25 [4]
Sex (% Male)	53	52
Race (%)
White	25	48
Black	30	19
Asian	15	33
More than One Race	15	0
Other	15	0
Antipsychotic use (% using)	25	0
Anxiolytic use (% using)	5	0
Antidepressant use (% using)	29	0

SIPS/SOPS	Mean [SD]

P5 (Disorganized Communication)	2.6 [1.0]	0.4 [0.6]
Total Positive Symptoms	14.4 [3.2]	1.8 [2.4]
N5 (Ideational Richness)	1.5 [1.3]	0.4 [0.6]
Total Negative Symptoms	14.6 [6.4]	1.4 [1.5]

Open in a new tab

Of note, no associations were found for any TLC item scores with sex or race/ethnicity. Within the CHR cohort, minors (N = 9) had greater TLC poverty of speech (U = 111, p = 0.008) shorter sentence length (U = 98, p = 0.007) and decreased use of determiners (U = 115, p = 0.02). Prescription of antipsychotics (yes/no) within the CHR cohort (N = 15) was associated with poverty of speech (U = 191, p = 0.009), but less derailment (U = 200 p = 0.02) and circumstantial speech (U = 177, p = 0.005). No associations with language variables were found for prescription of other medications.

3.2. Construct validity: Spearman Correlations among NLP and TLC items

Correlations between NLP and TLC features were limited to the CHR group, given lack of variance in TLC scores among healthy controls. Three key NLP features (Bedi et al., 2015) were tested for hypothesized correlations with TLC items, specifically NLP minimum semantic coherence with TLC “positive thought disorder” features of tangentiality, derailment, and circumstantiality, and NLP metrics of complexity (sentence length and use of determiner pronouns) with “negative thought disorder” features of TLC poverty of content and poverty of speech (Figure 1). Exploratory analyses focused on additional TLC items of pressured speech, loss of goal and perseveration (Krell et al., 2021), and on inverse associations of NLP coherence with TLC negative thought disorder, and of NLP complexity with TLC positive thought disorder, as well as association of NLP coherence with SIPS/SOPS P5 “disorganized communication” and of NLP complexity with SIPS/SOPS N5 “Decreased ideational richness”. For the twenty exploratory analyses, there was correction for multiple comparisons (alpha = .05/20 = .0025).

Figure 1. — Correlation matrix of TLC items and NLP metrics.

*Denotes significance of p between .001 and .005

** Denotes significance of p < .001

3.2.1. Correlations of NLP minimum semantic coherence with TLC item severity

NLP minimum semantic coherence is a metric for which a lower (more negative) value indicates less coherence. Therefore, as hypothesized, minimum semantic coherence was associated with tangentiality (r_s = −.74, p < .001) (Figure 2), derailment (r_s = −.56, p < .001) and circumstantiality (r_s = −.58, p < .001), supporting construct validity. In exploratory analyses, NLP minimum semantic coherence was also associated with TLC pressure of speech (r_s = −.48, p < .001) and loss of goal (r_s = −.56, p < .001). Also in exploratory analyses, minimum semantic coherence had a negative association with the TLC negative thought disorder item of poverty of speech (r= .55, p < .001), consistent with a decrease in coherence being observed only in the context of sufficient complexity of speech. These post-hoc correlations survived correction for multiple comparisons.

Figure 2. — Spearman correlation of tangentiality (TLC) and minimum semantic coherence (NLP).

3.2.2. Correlations of NLP maximum sentence length with TLC item severity

As hypothesized, maximum sentence length was negatively associated with TLC negative thought disorder items of poverty of speech (r = −.29, p = .006) and poverty of content (r = −.38, p = .007), supporting construct validity. In exploratory analyses, of interest, maximum sentence length was also associated with TLC positive thought disorder items of tangentiality (r = .44, p < .001) (Figure 3) and circumstantiality (r = .30, p = .006), as well as pressure of speech (r = .24, p = .007), and perseveration (r = −.34, p = .004), again suggesting that sufficient complexity of speech is needed for a decrease in coherence to be observed. Correcting for multiple comparisons (p < .0025), these associations were significant or at a trend level.

Figure 3. — Spearman correlation of tangentiality (TLC) and maximum sentence length (NLP).

3.2.3. Correlations of NLP determiner usage with TLC item severity

Like maximum sentence length, the other NLP feature of complexity, NLP determiner usage was, as hypothesized, associated with TLC negative thought disorder items of poverty of speech (r_s = −.40, p = .001) (Figure 4) and poverty of content (r_s = −.34, p = .01), supporting construct validity. In exploratory analyses, of interest, NLP determiner use was also associated with TLC positive thought disorder items of tangentiality (r_s = .30, p < .01) and circumstantiality (r = .30, p = .02), as well as with pressure of speech (r_s = .27, p = .004) and perseveration (r_s = −.34, p = .006), yet again suggesting that speech must have sufficient complexity for decreases in coherence to be observed (though these associations do not clearly survive correction for multiple comparisons with p < .0025).

Figure 4. — Spearman correlation of poverty of speech (TLC) and use of determiners (NLP).

3.2.4. Correlations of NLP metrics with SIPS ratings of language disturbance

In an exploratory analysis, NLP features were assessed for association with SIPS/SOPS clinical ratings of language disturbance, specifically P5 (“disorganized communication”) and N5 (“decreased ideational richness”), correcting for age, sex and ethnicity. The only support for construct validity was an association of sentence length (but not determiner use) with N5 (r_s = −.29, p = .03), whereas SIPS P5 had no association with NLP coherence (r_s = −.008, p = .96).

4. Discussion

The main aim of this study was to determine the construct validity for automated natural language processing (NLP) metrics of semantic coherence and syntactic complexity, as applied to transcripts of open-ended interviews that were also rated using Andreasen’s Scale of Thought, Language and Communication (TLC), with its heuristic of positive and negative thought disorder. It is appropriate that Andreasen’s TLC scale should be used to test the construct validity of natural language processing (NLP) metrics, as she emphasized the inference of disorganization in thought simply by observing a “patient’s speech and language behavior”, “without complicated experimental procedures” and “without any attempt to characterize the underlying cognitive processes”, in other words, the person’s natural language (Andreasen, 1986). Further, her heuristic of negative and positive thought disorder is reflected in her statement that in schizophrenia, the speaker “violates the syntactical and semantic conventions which govern language usage” (Andreasen, 1986).

In support of construct validity, as hypothesized, we found that in a psychosis risk cohort, decreased NLP semantic coherence was highly correlated with TLC positive thought disorder items of tangentiality, circumstantiality and derailment, and that reduced NLP syntactic complexity metrics of sentence length and determiner pronoun usage were significantly associated with TLC negative thought disorder items of poverty of content, and poverty of speech. Our findings of construct validity are consistent with recent findings in a schizophrenia cohort, in whom LSA coherence was associated with TLC positive thought disorder items of circumstantiality (r = −0.36) and perseveration (r = −0.77), and in whom total words spoken were associated with TLC poverty of speech (r = −0.70) (Krell et al., 2021). Further support for an association of NLP metrics and TLC ratings more broadly come from prior studies, in which LSA coherence was decreased in schizophrenia subgroups with higher TLC ratings (Elvevag et al., 2007), and in which NLP coherence and sums of TLC ratings had comparable classification accuracy in distinguishing schizophrenia from the norm (Sarzynska-Wawer et al., 2021)

Our findings from exploratory hypotheses also support that these are separable constructs in that decreases in NLP coherence and TLC positive thought disorder items are “anticorrelated” respectively with TLC negative thought disorder items and NLP syntactic complexity, specifically an association of NLP coherence and TLC poverty of speech in both our psychosis risk cohort, and previously in a schizophrenia cohort (Krell et al., 2021). Correspondingly, we found lower NLP syntactic complexity to be associated with less severe TLC positive thought disorder items of tangentiality and circumstantiality. Together, these data suggest that spoken language across the psychosis spectrum must have sufficient complexity for decreases in coherence to be observed, and these metrics are separable.

The establishment of construct validity for NLP metrics of semantic coherence and syntactic complexity is important for their development of biomarkers of language disturbance in schizophrenia and its risk states, including for mechanistic studies. For example, these NLP semantic and syntactic features have also been found to be correlated in CHR patients with structural and functional connectivity measures in the language network (Haas et al., 2020). Much work remains to be done on their natural variation in the population at large, as we found here that in a CHR cohort, teens had less syntactic complexity than adults. Also, consistent with work by de Boer et al., (2020), decrease in syntactic complexity is associated with the prescription of antipsychotic medication among patients, though here we also observed an associated with greater coherence. Finally, most studies of NLP metrics of language disturbance has been done in English, though increasingly there are NLP studies in other languages, including Portuguese (Mota et al., 2012), Dutch (de Boer et al., 2020), Polish (Sarzynska-Wawer et al., 2021), Spanish and Mandarin (unpublished data).

This study has several limitations, including that raters in this study were not fully blinded with respect to group when calculating TLC ratings, which future work should seek to rectify. While this may have led to inflation of TLC ratings, we do not see how it could account for evidence of construct validity for NLP metrics of coherence and complexity. Another limitation is that clinical SIPS/SOPS ratings were not obtained in the same session as the open-ended interview, such that their use for establishing construct validity for NLP metrics may have been compromised. Also, unfortunately, we did not have measures of motor disturbance such as extrapyramidal symptoms (EPS), that could have shed light on the association of antipsychotic prescription with reduced syntactic complexity in the CHR cohort. Longitudinal study, with careful measurement of EPS, could determine if the initiation of an antipsychotic is associated with an increase in poverty of speech over time. Future studies could also evaluate association of NLP coherence and complexity with neuropsychological domains, to better understand mechanisms.

Further, the CHR sample (N = 60) is modest. While providing a first examination of concurrent validity of NLP metrics of complexity and coherence with clinical ratings of TLC thought disorder in a clinical high-risk cohort, these findings need to be replicated in independent and larger cohorts. Additional measures of language disturbance might provide further validity as well.

In sum, we have established the construct validity of NLP metrics of coherence and complexity, in respect to clinical ratings or positive and negative thought disorder, using the gold-standard Scale of Thought, Language and Communication, developed decades ago by from Andreasen. Further we have shown that these two constructs of “NLP coherence/TLC positive thought disorder” and “NLP complexity/TLC negative thought disorder” are distinct.

Future directions will focus on how these metrics of coherence and complexity in language are related to other speech acoustic features. In an earlier CHR cohort, these same measures of syntactic complexity - sentence length and determiner usage – were both correlated with increased and longer pauses, and with negative symptoms (Stanislawski et al, 2020). Similarly, in the recent study of schizophrenia patients, acoustic features were also analyzed, finding that a smoothed pitch contour was associated with latency in speech (Krell et al., 2021). Finally, language features can also be examined in concert with other potential biomarkers of outcome in CHR cohorts, including auditory processing deficits, brain structure and connectivity, fluid biomarkers, and cognition.

Acknowledgements

This work was supported by the National Institutes of Health: R01MH107558. We are grateful to the participants who made this research possible and the research staff for their contributions.

Role of funding source

The funding source had no role in study design; collection, analysis and interpretation of data; writing of the report; or the decision to submit the article for publication.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Declaration of competing interest

None.

References

Andreasen NC, 1986. Scale for the assessment of thought, language, and communication (TLC). Schizophr. Bull 10.1093/schbul/12.3.473 [DOI] [PubMed] [Google Scholar]
Andreasen NC, 1979. Thought, language, and communication disorders: II. Diagnostic significance. Arch. Gen. Psychiatry 36, 1325–1330. [DOI] [PubMed] [Google Scholar]
Bearden CE, Wu KN, Caplan R, Cannon TD, 2011. Thought disorder and communication deviance as predictors of outcome in youth at clinical high risk for psychosis. J. Am. Acad. Child Adolesc. Psychiatry. 10.1016/j.jaac.2011.03.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bedi Gillinder, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, Ribeiro S, Javitt DC, Copelli M, Corcoran CM, 2015. Automated analysis of free speech predicts psychosis onset in high-risk youths. npj Schizophr. 1, 1–7. 10.1038/npjschz.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, Ribeiro S, Javitt DC, Copelli M, Corcoran CM, 2015. Automated analysis of free speech predicts psychosis onset in high-risk youths. npj Schizophr. 1. 10.1038/npjschz.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ben-David S, Birnbaum ML, Eilenberg ME, DeVylder JE, Gill KE, Schienle J, Azimov N, Lukens EP, Davidson L, Corcoran CM, 2014. The subjective experience of youths at clinically high risk of psychosis: A qualitative study. Psychiatr. Serv 65. 10.1176/appi.ps.201300527 [DOI] [PMC free article] [PubMed] [Google Scholar]
Brown M, Kuperberg GR, 2015. A hierarchical generative framework of language processing: Linking language perception, interpretation, and production abnormalities in schizophrenia. Front. Hum. Neurosci 10.3389/fnhum.2015.00643 [DOI] [PMC free article] [PubMed] [Google Scholar]
Corcoran CM, Carrillo F, Fernández-Slezak D, Bedi G, Klim C, Javitt DC, Bearden CE, Cecchi GA, 2018. Prediction of psychosis across protocols and risk cohorts using automated language analysis. World Psychiatry 17. 10.1002/wps.20491 [DOI] [PMC free article] [PubMed] [Google Scholar]
Corcoran CM, Mittal VA, Bearden CE, E. Gur R, Hitczenko K, Bilgrami Z, Savic A, Cecchi GA, Wolff P, 2020. Language as a biomarker for psychosis: A natural language processing approach. Schizophr. Res 226, 158–166. 10.1016/j.schres.2020.04.032 [DOI] [PMC free article] [PubMed] [Google Scholar]
de Boer JN, Voppel AE, Brederoo SG, Wijnen FNK, Sommer IEC, 2020. Language disturbances in schizophrenia: the relation with antipsychotic medication. NPJ Schizophr. 6. 10.1038/S41537-020-00114-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
Devlin J, Chang MW, Lee K, Toutanova K, 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf. 1, 4171–4186. [Google Scholar]
Elvevag B, Foltz P, Weinberger D, Goldberg T, 2007. Quantifying incoherence in speech: an automated methodology and novel application to schizophrenia. Schizophr. Res 93, 304–316. 10.1016/J.SCHRES.2007.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
Elvevåg B, Foltz PW, Rosenstein M, DeLisi LE, 2010. An automated method to analyze language use in patients with schizophrenia and their first-degree relatives. J. Neurolinguistics. 10.1016/j.jneuroling.2009.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gooding D, Ott S, Roberts S, Erlenmeyer-Kimling L, 2013. Thought disorder in mid-childhood as a predictor of adulthood diagnostic outcome: findings from the New York High-Risk Project. Psychol. Med 43, 1003–1012. 10.1017/S0033291712001791 [DOI] [PubMed] [Google Scholar]
Gooding DC, Coleman MJ, Roberts SA, Shenton ME, Levy DL, Erlenmeyer-Kimling L, 2012. Thought disorder in offspring of schizophrenic parents: Findings from the New York high-risk project. Schizophr. Bull 10.1093/schbul/sbq061 [DOI] [PMC free article] [PubMed] [Google Scholar]
Haas SS, Doucet GE, Garg S, Herrera SN, Sarac C, Bilgrami ZR, Shaik RB, Corcoran CM, 2020. Linking language features to clinical symptoms and multimodal imaging in individuals at clinical high risk for psychosis. Eur. Psychiatry 63. 10.1192/J.EURPSY.2020.73 [DOI] [PMC free article] [PubMed] [Google Scholar]
Krell R, Tang W, Hänsel K, Sobolev M, Cho S, 2021. Lexical and Acoustic Correlates of Clinical Speech Disturbance in Schizophrenia. [Google Scholar]
Landauer TK, Dumais ST, 1997. A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychol. Rev 10.1037/0033-295X.104.2.211 [DOI] [Google Scholar]
Landauer TK, Foltz PW, Laham D, 1998. An introduction to latent semantic analysis. Discourse Process. 10.1080/01638539809545028 [DOI] [Google Scholar]
Marengo JT, Harrow M, 1997. Longitudinal Courses of Thought Disorder in Schizophrenia and Schizoaffective Disorder. [DOI] [PubMed] [Google Scholar]
Mota NB, Copelli M, Ribeiro S, 2017. Thought disorder measured as random speech structure classifies negative symptoms and schizophrenia diagnosis 6 months in advance. npj Schizophr. 3, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mota NB, Vasconcelos NAP, Lemos N, Pieretti AC, Kinouchi O, Cecchi GA, Copelli M, Ribeiro S, 2012. Speech graphs provide a quantitative measure of thought disorder in psychosis. PLoS One 7, e34928. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rezaii N, Walker E, Wolff P, 2019. A machine learning approach to predicting psychosis using semantic density and latent content analysis. npj Schizophr. 5, 1–12. 10.1038/S41537-019-0077-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
Roche E, Creed L, Macmahon D, Brennan D, Clarke M, 2015. The Epidemiology and Associated Phenomenology of Formal Thought Disorder: A Systematic Review. Schizophr. Bull 10.1093/schbul/sbu129 [DOI] [PMC free article] [PubMed] [Google Scholar]
Santorini B, 1990. Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision). Univ. Pennsylvania 3rd Revis. 2nd Print. 10.1017/CBO9781107415324.004 [DOI] [Google Scholar]
Sarzynska-Wawer J, Wawer A, Pawlak A, Szymanowska J, Stefaniak I, Jarkiewicz M, Okruszek L, 2021. Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res. 304, 114135. 10.1016/J.PSYCHRES.2021.114135 [DOI] [PubMed] [Google Scholar]
Spencer TJ, Thompson B, Oliver D, Diederen K, Demjaha A, Weinstein S, Morgan SE, Day F, Valmaggia L, Rutigliano G, De Micheli A, Mota NB, Fusar-Poli P M.P., 2020. Lower speech connectedness linked to incidence of psychosis in people at clinical high risk. Schizophr. Res 20, 30458. [DOI] [PubMed] [Google Scholar]
Stanislawski Emma; Bilgrami Zarina; Sarac Cansu; Garg Sahil; Heisig Stephen; Cecchi Guillermo; Agurto Carla; Corcoran C, 2020. Negative Symptoms and Speech Pauses in Youths at Clinical High Risk for Psychosis. npj Schizophr. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
Woods SW, Walsh BC, Powers AR, McGlashan TH, 2019. Reliability, Validity, Epidemiology, and Cultural Variation of the Structured Interview for Psychosis-Risk Syndromes (SIPS) and the Scale of Psychosis-Risk Symptoms (SOPS), in: Handbook of Attenuated Psychosis Syndrome Across Cultures. Springer International Publishing, pp. 85–113. 10.1007/978-3-030-17336-4_5 [DOI] [Google Scholar]
Yalincetin B, Bora E, Binbay T, Ulas H, Akdede BB, Alptekin K, 2017. Formal thought disorder in schizophrenia and bipolar disorder: A systematic review and meta-analysis. Schizophr. Res 185, 2–8. [DOI] [PubMed] [Google Scholar]

[R1] Andreasen NC, 1986. Scale for the assessment of thought, language, and communication (TLC). Schizophr. Bull 10.1093/schbul/12.3.473 [DOI] [PubMed] [Google Scholar]

[R2] Andreasen NC, 1979. Thought, language, and communication disorders: II. Diagnostic significance. Arch. Gen. Psychiatry 36, 1325–1330. [DOI] [PubMed] [Google Scholar]

[R3] Bearden CE, Wu KN, Caplan R, Cannon TD, 2011. Thought disorder and communication deviance as predictors of outcome in youth at clinical high risk for psychosis. J. Am. Acad. Child Adolesc. Psychiatry. 10.1016/j.jaac.2011.03.021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Bedi Gillinder, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, Ribeiro S, Javitt DC, Copelli M, Corcoran CM, 2015. Automated analysis of free speech predicts psychosis onset in high-risk youths. npj Schizophr. 1, 1–7. 10.1038/npjschz.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, Ribeiro S, Javitt DC, Copelli M, Corcoran CM, 2015. Automated analysis of free speech predicts psychosis onset in high-risk youths. npj Schizophr. 1. 10.1038/npjschz.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Ben-David S, Birnbaum ML, Eilenberg ME, DeVylder JE, Gill KE, Schienle J, Azimov N, Lukens EP, Davidson L, Corcoran CM, 2014. The subjective experience of youths at clinically high risk of psychosis: A qualitative study. Psychiatr. Serv 65. 10.1176/appi.ps.201300527 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Brown M, Kuperberg GR, 2015. A hierarchical generative framework of language processing: Linking language perception, interpretation, and production abnormalities in schizophrenia. Front. Hum. Neurosci 10.3389/fnhum.2015.00643 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Corcoran CM, Carrillo F, Fernández-Slezak D, Bedi G, Klim C, Javitt DC, Bearden CE, Cecchi GA, 2018. Prediction of psychosis across protocols and risk cohorts using automated language analysis. World Psychiatry 17. 10.1002/wps.20491 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Corcoran CM, Mittal VA, Bearden CE, E. Gur R, Hitczenko K, Bilgrami Z, Savic A, Cecchi GA, Wolff P, 2020. Language as a biomarker for psychosis: A natural language processing approach. Schizophr. Res 226, 158–166. 10.1016/j.schres.2020.04.032 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] de Boer JN, Voppel AE, Brederoo SG, Wijnen FNK, Sommer IEC, 2020. Language disturbances in schizophrenia: the relation with antipsychotic medication. NPJ Schizophr. 6. 10.1038/S41537-020-00114-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Devlin J, Chang MW, Lee K, Toutanova K, 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf. 1, 4171–4186. [Google Scholar]

[R12] Elvevag B, Foltz P, Weinberger D, Goldberg T, 2007. Quantifying incoherence in speech: an automated methodology and novel application to schizophrenia. Schizophr. Res 93, 304–316. 10.1016/J.SCHRES.2007.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Elvevåg B, Foltz PW, Rosenstein M, DeLisi LE, 2010. An automated method to analyze language use in patients with schizophrenia and their first-degree relatives. J. Neurolinguistics. 10.1016/j.jneuroling.2009.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Gooding D, Ott S, Roberts S, Erlenmeyer-Kimling L, 2013. Thought disorder in mid-childhood as a predictor of adulthood diagnostic outcome: findings from the New York High-Risk Project. Psychol. Med 43, 1003–1012. 10.1017/S0033291712001791 [DOI] [PubMed] [Google Scholar]

[R15] Gooding DC, Coleman MJ, Roberts SA, Shenton ME, Levy DL, Erlenmeyer-Kimling L, 2012. Thought disorder in offspring of schizophrenic parents: Findings from the New York high-risk project. Schizophr. Bull 10.1093/schbul/sbq061 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Haas SS, Doucet GE, Garg S, Herrera SN, Sarac C, Bilgrami ZR, Shaik RB, Corcoran CM, 2020. Linking language features to clinical symptoms and multimodal imaging in individuals at clinical high risk for psychosis. Eur. Psychiatry 63. 10.1192/J.EURPSY.2020.73 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Krell R, Tang W, Hänsel K, Sobolev M, Cho S, 2021. Lexical and Acoustic Correlates of Clinical Speech Disturbance in Schizophrenia. [Google Scholar]

[R18] Landauer TK, Dumais ST, 1997. A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. Psychol. Rev 10.1037/0033-295X.104.2.211 [DOI] [Google Scholar]

[R19] Landauer TK, Foltz PW, Laham D, 1998. An introduction to latent semantic analysis. Discourse Process. 10.1080/01638539809545028 [DOI] [Google Scholar]

[R20] Marengo JT, Harrow M, 1997. Longitudinal Courses of Thought Disorder in Schizophrenia and Schizoaffective Disorder. [DOI] [PubMed] [Google Scholar]

[R21] Mota NB, Copelli M, Ribeiro S, 2017. Thought disorder measured as random speech structure classifies negative symptoms and schizophrenia diagnosis 6 months in advance. npj Schizophr. 3, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Mota NB, Vasconcelos NAP, Lemos N, Pieretti AC, Kinouchi O, Cecchi GA, Copelli M, Ribeiro S, 2012. Speech graphs provide a quantitative measure of thought disorder in psychosis. PLoS One 7, e34928. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Rezaii N, Walker E, Wolff P, 2019. A machine learning approach to predicting psychosis using semantic density and latent content analysis. npj Schizophr. 5, 1–12. 10.1038/S41537-019-0077-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Roche E, Creed L, Macmahon D, Brennan D, Clarke M, 2015. The Epidemiology and Associated Phenomenology of Formal Thought Disorder: A Systematic Review. Schizophr. Bull 10.1093/schbul/sbu129 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Santorini B, 1990. Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision). Univ. Pennsylvania 3rd Revis. 2nd Print. 10.1017/CBO9781107415324.004 [DOI] [Google Scholar]

[R26] Sarzynska-Wawer J, Wawer A, Pawlak A, Szymanowska J, Stefaniak I, Jarkiewicz M, Okruszek L, 2021. Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res. 304, 114135. 10.1016/J.PSYCHRES.2021.114135 [DOI] [PubMed] [Google Scholar]

[R27] Spencer TJ, Thompson B, Oliver D, Diederen K, Demjaha A, Weinstein S, Morgan SE, Day F, Valmaggia L, Rutigliano G, De Micheli A, Mota NB, Fusar-Poli P M.P., 2020. Lower speech connectedness linked to incidence of psychosis in people at clinical high risk. Schizophr. Res 20, 30458. [DOI] [PubMed] [Google Scholar]

[R28] Stanislawski Emma; Bilgrami Zarina; Sarac Cansu; Garg Sahil; Heisig Stephen; Cecchi Guillermo; Agurto Carla; Corcoran C, 2020. Negative Symptoms and Speech Pauses in Youths at Clinical High Risk for Psychosis. npj Schizophr. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Woods SW, Walsh BC, Powers AR, McGlashan TH, 2019. Reliability, Validity, Epidemiology, and Cultural Variation of the Structured Interview for Psychosis-Risk Syndromes (SIPS) and the Scale of Psychosis-Risk Symptoms (SOPS), in: Handbook of Attenuated Psychosis Syndrome Across Cultures. Springer International Publishing, pp. 85–113. 10.1007/978-3-030-17336-4_5 [DOI] [Google Scholar]

[R30] Yalincetin B, Bora E, Binbay T, Ulas H, Akdede BB, Alptekin K, 2017. Formal thought disorder in schizophrenia and bipolar disorder: A systematic review and meta-analysis. Schizophr. Res 185, 2–8. [DOI] [PubMed] [Google Scholar]

PERMALINK

Construct validity for computational linguistic metrics in individuals at clinical risk for psychosis: associations with clinical ratings

Zarina R Bilgrami

Cansu Sarac

Agrima Srivastava

Shaynna N Herrera

Matilda Azis

Shalaila S Haas

Riaz B Shaik

Muhammad A Parvaz

Vijay A Mittal

Guillermo Cecchi

Cheryl M Corcoran

Abstract

1. Introduction

2. Methods

2.1. Participants

2.2. Speech recording and processing

2.3. Thought, Language and Communication (TLC) Scale: Manual ratings

Table 2.

2.4. Natural language processing (NLP)

2.5. Data Analysis

3. Results

3.1. Descriptive statistics, including group differences

Table 1.

3.2. Construct validity: Spearman Correlations among NLP and TLC items

Figure 1.

3.2.1. Correlations of NLP minimum semantic coherence with TLC item severity

Figure 2.

3.2.2. Correlations of NLP maximum sentence length with TLC item severity

Figure 3.

3.2.3. Correlations of NLP determiner usage with TLC item severity

Figure 4.

3.2.4. Correlations of NLP metrics with SIPS ratings of language disturbance

4. Discussion

Acknowledgements

Role of funding source

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases