Skip to main content
Schizophrenia Bulletin logoLink to Schizophrenia Bulletin
. 2020 May 18;46(Suppl 1):S87. doi: 10.1093/schbul/sbaa031.202

S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL, SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH

Alban Voppel 1, Janna de Boer 2, Fleur Slegers 3, Hugo Schnack 2, Iris Sommer 1
PMCID: PMC7234480

Abstract

Background

The diagnosis of schizophrenia is currently based on anamnesis and psychiatric examination only. Language biomarkers may be useful to provide a quantitative and reproducible risk estimate for this spectrum of disorders. While people with schizophrenia spectrum disorders may show one or more language abnormalities, such as incoherence, affective flattening, failure of reference as well as changes in sentence length and complexity, the clinical picture can vary largely between individuals and language abnormalities will reflect this heterogeneity.

Computational linguistics can be used to quantify these features of language. Because of the heterogeneous character of the various symptoms present in schizophrenia spectrum subjects, we expect some subjects to show semantic incoherence, while others may have more affective symptoms such as monotonous speech. Here, we combine phonological, semantic and syntactic features of semi-spontaneous language with machine learning algorithms for classification in order to develop a biomarker sensitive to the broad spectrum of schizophrenia.

Methods

Semi-spontaneous natural language samples were collected from 50 subjects with schizophrenia spectrum disorders and 50 age, gender and parental education matched controls, using recorded neutral-topic, open-ended interviews. The audio samples were speaker coded; audio belonging to the subject was extracted and transcribed. Phonological features were extracted using OpenSMILE; semantic features were calculated using a word2vec model using a moving windows of coherence approach, and finally syntactic aspects were calculated using the T-scan tool. Feature reduction was applied to each of the domains. To distinguish groups, results from machine learning classifiers trained using leave-one-out cross-validation on each of these aspects were combined, incorporating a voting mechanism.

Results

The machine-learning classifier approach obtained 75–78% accuracy for the semantic, syntactic and phonological domains individually. As most distinguishing features of their respective domain, we found reduced timbre and intonation for the phonological domain, increased variance of coherence for the semantic domain and decreased complexity of speech in the syntactic domain. The combined approach, using a voting algorithm across the domains, achieved an accuracy of 83% and a precision score of 89%. No significant differences in age, gender or parental education between healthy controls and subjects with schizophrenia spectrum disorders was found.

Discussion

In this study we demonstrated that computational features derived from different linguistic domains capture aspects of symptomatic language of schizophrenia spectrum disorder subjects. The combination of these features was useful to improve classification for this heterogeneous disorder, as we showed high accuracy and precision from the language parameters in distinguishing schizophrenia patients from healthy controls. These values are better than those obtained with imaging or blood analyses, while language is a more easily obtained and cheaper measure than those derived from other methods. Validation in an independent sample is required, and further features of differentiation should be extracted for their respective domains. Our positive results in using language abnormalities to automatically detect schizophrenia show that computational linguistics is a promising method in the search for reliable markers in psychiatry.


Articles from Schizophrenia Bulletin are provided here courtesy of Oxford University Press

RESOURCES