Abstract
Objective:
Acoustic phonetic measures have been found to correlate with negative symptoms of schizophrenia, thus offering a path toward quantitative measurement of such symptoms. These acoustic properties include F1 and F2 measurements (affected by tongue height and tongue forward/back position, respectively), which determine a general “vowel space.” Among patients and controls, we consider two evaluations of the phonetic measures of vowel space (average Euclidean distance from a participant’s mean F1 and mean F2, and density of vowels around one standard deviation of mean F1 and of F2).
Methods:
Structured and spontaneous speech of 148 participants (70 patients and 78 controls) was recorded and measured acoustically. We examined correlations between the phonetic measures of vowel space and ratings of aprosody obtained using two clinical research measures, the Scale for the Assessment of Negative Symptoms (SANS) and the Clinical Assessment Interview for Negative Symptoms (CAINS).
Results:
Vowel space measurements were significantly associated with patient/control status, attributed to a cluster of 13 patients whose phonetic values correspond to reduced vowel space as assessed by both measures. No correlation was found between phonetic measures and relevant items and averages of ratings on the SANS and CAINS. Reduced vowel space appears to affect only a subset of patients with schizophrenia, potentially those on higher antipsychotic dosages.
Conclusions:
Acoustic phonetic measures may be more sensitive measures of constricted vowel space than clinical research rating scales of aprosody or monotone speech. Replications are needed before further interpretation of this novel finding, including potential medication effects.
Keywords: Aprosody, Linguistics, Negative symptoms, Phonetics, Schizophrenia, Vowel space
Introduction
The Scale for the Assessment of Negative Symptoms (SANS; Andreasen, 1983) and Clinical Assessment Interview for Negative Symptoms (CAINS; Blanchard et al., 2011; Forbes et al., 2010) are used in research settings to evaluate negative symptoms of schizophrenia. Negative symptoms as a whole are likely more difficult to classify with quantitative measures than are positive symptoms (Andreasen et al., 1981); one consequence is less progress in this domain of symptoms. To address gaps in effective treatments for patients with schizophrenia, researchers are motivated to connect qualitative scale ratings with truly quantitative measures, but this has been challenging, with advances limited. Speech-related negative symptoms are a promising place to investigate measurable correlates, as many aspects of speech are quantifiable. The phonetic quality of speech-related negative symptoms can be broadly described as aprosodic, or monotone. The relevant item on the SANS is item 7, part of the Affective Flattening or Blunting subscale, the description of which is “Lack of Vocal Inflections: The patient fails to show normal vocal emphasis patterns, is often monotonic.” On the CAINS, it is item 11, Vocal Expression, described as reflecting “changes in tone during the course of speech,” wherein a severe deficit would register as: “Few if any changes in vocal expressions when describing engagement with family, social and recreational activities. No noticeable change in voice intensity and prosody is monotonous.”
Clinical researchers currently rely on scales like these to measure all schizophrenia-related speech effects, and clinicians nearly always rely on qualitative descriptions without any numerical scales. In the clinical research setting, raters often score patients after the interview has concluded, and exactly which phonetic phenomena a particular clinician registers, as well as their ability to ascertain the minute differentiations in each phenomenon may vary, suggesting a need for a more reliable, quantitative method for assessing negative symptoms as they pertain to speech effects in patients with schizophrenia. Such methods generally rely on the properties of pitch and loudness, discussed below. In this analysis, we investigated a lesser-studied property, vowel space, which is the range of vowel articulations of a speaker.
Pitch is the phonetic property that most closely corresponds to the description of the relevant SANS and CAINS items. Pitch is the perception of the speed at which the air expelled from the lungs passes between the vocal folds, corresponding to the frequency of the opening and closing of the glottis; i.e., fundamental frequency (notated F0). Mean F0 is not expected to be of interest, as males generally have a lower average pitch than females, and as there is great variation within both sexes. Rather, the item of interest in affected speech (i.e., speech affected by negative symptoms) is the amount of pitch variation present in a single speaker’s speech production—affected speech is expected to demonstrate less pitch variation than non-affected speech. Descriptions of affected speech that likely involves (decreased) modulation of pitch includes a “monotonic quality” (Andreasen et al., 1981) and having “less inflection,” (Alpert et al., 2000) as well as the “flat affect,” “lack of vocal inflections,” and “monotonic” notations of the SANS and “changes in tone,” “changes in vocal expressions,” and “prosody is monotonous” of the CAINS.
Prior research documents that pitch variability significantly associates with patient versus control status (Alpert et al., 2000; Cohen et al., 2008; Kliper et al., 2015; McGilloway, 1997; Püschel et al., 1998; Ross et al., 2001), and correlates significantly with clinical ratings of flat affect (Andreasen et al., 1981). Other studies have found a lack of significant categorical associations (Martínez-Sánchez et al., 2015; Meaux et al., 2018; Rapcan et al., 2010) and correlations with clinical ratings of negative symptoms (Covington et al., 2012). Comparisons across studies, however, are difficult due to variations in the measurement of clinical variables as well as linguistic variables, in addition to variability in cohorts studied (in terms of demographics, nationality/language, and symptom severity). Compton et al. (2018)—using the same raw data as the current analysis—found significant differences between controls and patients rated as having aprosody, operationalized as a score of 3–5 on the SANS “Lack of Vocal Inflections” item and a score of 2–4 on the CAINS “Vocal Expression” item. Specifically, compared to controls, those with aprosody had: (1) lower intensity/loudness across all five speech tasks, (2) lower pitch (F0) variation when reading an emotionally stimulating short story excerpt, and (3) lower F2 variability in three of the tasks (one pertaining to spontaneous speech and two involving reading out loud).
Intensity is another phonetic property that may play a role in clinical observations of affected speech. Intensity is perceived as loudness, which is measured in decibels. The intensity of a sound is the perception of the force with which air is expelled from the lungs, corresponding to the physical sound property of amplitude. Affected speech would be overall softer (having lower mean intensity) and/or may have less variability in perceived loudness. Descriptions of affected speech that likely involve intensity include “decrease in variability of…amplitude” (Andreasen et al., 1981) and “emphasis” (Alpert et al., 2000) as well as the “emphasis patterns” of the SANS, and the “changes in vocal expressions” and “voice intensity” of the CAINS. Studies have found that mean intensity and/or intensity variability significantly associate with patient/control status (Kliper et al., 2015; Martínez-Sánchez et al., 2015; McGilloway, 1997; Püschel et al., 1998; Rapcan et al., 2010); other studies have not found such associations (Alpert et al., 2000; Martínez-Sánchez et al., 2015; Meaux et al., 2018).
Amplitude and F0 are physiologically linked: as pressure of airflow out of the lungs increases, the vocal folds inherently vibrate somewhat faster, causing an increase in pitch (Titze, 1989). Changes in amplitude and fundamental frequency—perceived as intensity and pitch changes—commonly occur during an utterance. Intensity is expected to change when the speaker emphasizes certain components of the utterance, or when the speaker expresses the information with emotion. Pitch change across an utterance is referred to linguistically as intonation, or an utterance’s “tune” (Pierrehumbert, 1980). Speakers time their intonation to convey additional meaning regarding the correct parsing of the utterance, the focus of the utterance, and attitude towards the information conveyed (e.g., compare the neutral informative statement, “Sue is coming tomorrow,” to the surprised exclamation “Sue is coming tomorrow?!?”).
A further phonetic property of speech is the component frequencies of particular sounds. These, like pitch and intensity, may be expected to show less variability in the vowels of affected speech. Linguists find that the two most important component frequencies for vowel quality are the first resonant frequency (F1) and the second resonant frequency (F2). F1 and F2 are indicative of tongue height and tongue forward/back position, respectively, at the moment of articulation. A speaker’s vowels can thus be visually represented in a two-dimensional plane. The vowels shown in Figure 1 visually represent a canonical “vowel space” with four example vowels from English. These four vowels consist of the three English vowels often referred to as the “corner” vowels, as well as the most central of the English vowels. Speakers with a reduced vowel space will produce more articulations closer to the center of the potential vowel space. Vowels that are less differentiated are more susceptible to misidentification by listeners.
Figure 1.
Visual representation of theoretical canonical vowel space: four sample vowels with the pronunciations both in English spelling (in quotations) and the International Phonetic Alphabet, which is used to represent sounds unambiguously (in square brackets). The dots represent a speaker with more centralized articulations of the four vowels (a reduced vowel space).
Linguists often study speakers’ vowel spaces, as vowel productions shift based on a variety of factors. For example, the range of unstressed vowels is much smaller than that of stressed vowels in many languages, including Bulgarian (Lehiste and Popov, 1970), English (Lindblom, 1963), and Russian (Halle, 1959), among many others (see Crosswhite, 2001). Vowel space has also been found to reduce at faster talking speeds (e.g., Fourakis, 1991) and when words are uttered in a more predictable context (e.g., Lieberman, 1963), which is an effect that can vary by dialect (Clopper and Pierrehumbert, 2008). Linguistic reductions in vowel space generally occur when a speaker does not need to articulate as strongly to be understood (e.g., because the context is more predictable).
Acoustically, while F1 and F2 correspond to particular vowels, the absolute values of F1 and F2 are speaker-dependent. Perceptually, listeners quickly mentally map a speaker’s vowel space. Thus, while the F1 and F2 of a vowel from two different speakers might be identical, the vowel from Speaker A could be ‘ih’ as in ‘bit’ while the vowel from Speaker B could be ‘eh’ as in ‘bet.’ A corollary is that some speakers use a smaller vowel space compared with others. This is also illustrated in Figure 1, by the dots that represent a speaker with more centralized articulations of the four vowels. Here, we are interested in examining the range of vowel space used by patients with schizophrenia compared with unaffected controls.
Covington et al. (2012) found a significant correlation (r=−0.446) between reduced F2 variability and the severity of negative symptoms (as measured by the Positive and Negative Syndrome Scale) among 25 patients with early-course schizophrenia, but a non-significant correlation with F1 variability (though it was r=−0.339). Compton et al. (2018) expanded on this work and found F2 variability significantly correlated with participant status as a patient with clinically rated aprosody or as a control (in three of the five speaking/reading tasks; they also differed from patients without aprosody in one of those three tasks). Bernardini et al. (2016) also found significant correlations between patient aprosody severity and F2 variability. They found no similar correlation for F1 variability, perhaps because F1 has a smaller range than F2. The present study uses data from Compton et al. (2018) and considers F1 and F2 together as a two-dimensional vowel space. We assess two measures of vowel space that have never been studied in the context of schizophrenia, despite numerous findings pertaining to phonetics and other domains of linguistic measures—Euclidean distance (average distance of all of a speaker’s vowels from the speaker’s mean vowel) and a new measurement, density (proportion of a speaker’s vowels within one standard deviation of the means of F1 and of F2). Despite no extant research on vowel space and schizophrenia, given the aforementioned findings, we hypothesized that patients would show a reduced vowel space compared to controls.
Methods
Participants
This study examines the speech of 148 participants (70 patients and 78 controls), a subset of those described previously (Compton et al., 2018). Some participants did not produce enough speech for measurement or comparison; to prevent alogia or limited speech from confounding the analysis, all participants selected for inclusion in this analysis produced at least 750 vowels (see the speech tasks below). Included patients ranged from having produced 755 to 2,961 vowels, with a mean of 1,747 vowels. Included controls ranged from having produced 751 to 3,583 vowels, with a mean of 1,968 vowels. Data from the reading tasks were only included for participants who scored above a reading grade equivalent of 28 on the Wide Range Achievement Test (WRAT; Jastak & Wilkinson, 1984). In the present subset, 10 controls and 8 patients whose reading data was excluded produced at least 750 vowels in the three spontaneous speech tasks. The other participants’ measured vowels come from both the spontaneous tasks and the reading tasks.
Clinical Measures
After an in-depth, semi-structured clinical interview, a trained research assessor scored the SANS (Andreasen, 1983), which is comprised of 25 items scored on a scale from 0=none to 5=severe. Items are commonly grouped into five subscales: affective flattening or blunting, alogia, avolition-apathy, anhedonia-asociality, and attention. As a secondary measure of negative symptoms, the research assessor rated the CAINS (Blanchard et al., 2011; Forbes et al., 2010), a newer measure of negative symptoms that assesses both experiential and expressive deficits. We also administered the Scale for the Assessment of Positive Symptoms (SAPS; Andreasen, 1984), which consists of 34 items belonging to four subscales: hallucinations, delusions, bizarre behavior, and positive formal thought disorder. Each of the four areas includes ratings for specific symptoms (e.g., auditory hallucinations) as well as a global rating, all scored on a scale from 0=none to 5=severe. Inter-rater reliability across four raters (assessing 14 patients) was calculated using a two-way random effects analysis of variance model. Intraclass correlation coefficients were 0.84 (95% confidence interval (CI): 0.64, 0.94) for the SANS total score and 0.89 (95% CI: 0.72, 0.97) for the SAPS total score. Medication dosages were obtained from medical records made available at participating sites, though some missing data occurred.
Speech Measures
Speech samples were obtained with a Tascam DR-08 recorder with the following specifications: (1) ENCODING: PCM 16-bit 44.1 kHz monaural, (2) LOW CUT: Low 40 Hz, (3) REC EQ: Off, (4) microphone folded to a closed position, (5) built-in stand open, and (6) device placed on a table in front of the participant with the microphone about 12 in. from the participant. Speech was gathered through five tasks: three spontaneous speech tasks and two reading tasks. In the first task, participants were asked to describe a black-and-white line drawing of people in a beach scene. In the second task, they were asked to describe their perfect, most ideal day, and in the third task they were asked for their scariest memory. Participants were instructed to speak for two minutes for each of these tasks, and prompted once if they stopped before two minutes. In the fourth task, they read an emotionally-neutral passage from a novel, and in the fifth, they read an emotionally-charged passage from a different novel. Details can be found in Compton et al. (2018).
Cognitive Measures
Neurocognitive domains were assessed using the MATRICS Consensus Cognitive Battery (MCCB; Kern et al., 2008; Nuechterlein et al., 2008), an hour-long comprehensive cognitive battery compiled by experts for neurocognitive research in schizophrenia. Eight of the ten cognitive tests in the MCCB were included in the study; social cognition was not measured as it was deemed less pertinent than the neurocognitive domains, and sustained attention/vigilance was not measured as it is the only one that is computer-administered.
Data Analysis
Participants’ sound files were annotated to delineate syllabic peaks (i.e., vowels) using Prosogram (Mertens, 2004) and Praat (Boersma et al., 2021). The F1 and F2 at the midpoint of each delineated vowel were extracted using a Praat script (Lennes, 2011), and F0 at the delineated vowels’ midpoint was extracted using VoiceSauce (Shue, 2010). A different measurement of F0 was also extracted every 10 ms. throughout the audio recordings using VoiceSauce. Within each participant, measurements were calculated as z-scores, and values >∣3.29∣ were excluded.
Each participant’s vowel space was measured in two ways. First, the average Euclidean distance (ED) from a participant’s mean F1 and mean F2 was calculated, reducing each vowel’s F1 and F2 to a single measurement of distance from the mean vowel, providing a measurement of the average size of a participant’s vowel space. Euclidean distance (ED) = √((F1-F1_mean)2 + (F2-F2_mean)2). Second, for each participant, the density of vowels within one standard deviation of mean F1 and of F2 was taken as a proportion of all the participant’s vowels. Density = (number of speaker’s vowels within 1 SD of F1 and of F2) / (number of speaker’s vowels). Density represents the degree to which a speaker’s vowel space is reduced; i.e., tending to cluster toward the middle. While both ED and density quantify vowel space reduction, ED collapses over F1 and F2, whereas density accounts for any differences in F1, F2 variation by speaker (i.e., a speaker’s F2 has a bigger range than their F1). Figure 2 depicts how ED and density differ.
Figure 2.
F1 and F2 scatterplots from two participants, the first with ED=298 and density= 44.5%, and second with ED=297, density=67.4%. While the EDs are basically the same, about two-thirds of the second participant’s vowel points are within one SD around the mean of either F1 or of F2 (or of both), less than half of the first participant’s vowel points are (showing that density is a different measurement from ED).
Results
Sample Characteristics
Sociodemographic characteristics of patients and controls are given in Table 1. Patients were significantly younger, more likely to be male, had completed fewer years of education, were less likely to live with partners or roommates and more likely to live in a structured living arrangement, were less likely to have children, and were less likely to be employed.
Table 1.
Sociodemographic Characteristics of Patients and Controls
| Patients (n=70) | Controls (n=78) | Test statistic, df, p | |
|---|---|---|---|
| Age, in years | 31.0±9.4 | 33.6±9.4 | t=1.69, df=146, p=0.05, d=0.28 | 
| Gender, male | 51 (72.9%) | 44 (56.4%) | χ2=4.34, df=1, p=0.04 | 
| Ethnicity, non-Hispanic | 66 (94.3%) | 68 (87.2%) | χ2=2.18, df=1, p=0.14 | 
| Race | χ2=1.59, df=2, p=0.45 | ||
| Black or African American | 53 (75.7%) | 53 (67.9%) | |
| White or Caucasian | 9 (12.9%) | 16 | |
| Other | 8 (11.4%) | (20.5%) 9 (11.5%) | |
| Marital Status | χ2=2.04. df=2, p=0.36 | ||
| Single and never married | 61 (88.4%) | 66 (84.6%) | |
| Married or living with a partner | 3 (4.3%) | 8 (10.3%) | |
| Separated, divorced, or widowed | 5 (7.2%) | 4 (5.1%) | |
| Years of education completed | 12.9±2.5 | 14.2±2.6 | t=3.29, df=146, p<0.001, d=0.54 | 
| Who the participant lived with, past month | χ2=10.77, df=4, p=0.03 | ||
| Alone | 16 (22.9%) | 19 (24.4%) | |
| With parent, sibling or other family | 32 (45.7%) | 32 (41.0%) | |
| With boyfriend/girlfriend/spouse/partner | 3 (4.3%) | 10 | |
| With friends or roommates | 5 (7.1%) | (12.8%) | |
| Structured living arrangement or homeless | 14 (20.0%) | 12 (15.4%) 5 (6.4%) | |
| Has children | 14 (20.3%) | 33 (42.3%) | χ2=8.16, df=1, p=0.004 | 
| Had a job during the past month | 23 (32.9%) | 50 (64.1%) | χ2=14.41, df=1, p<0.001 | 
Vowel Space
A logistic regression run on patient versus control status found a significant contribution of ED (p=0.01) and density (p=0.02). ED and density were moderately correlated in patients (r=0.38, p=0.001) but not in controls (r=0.01, p=0.93). The distribution of ED and density in patients and controls is shown in Figure 3.
Figure 3.
Euclidean distance and density among both patients and controls
A hierarchical cluster analysis (cluster method: within-groups linkage; interval: squared ED; on z-scores of ED and density measurements) on the patient group identified thirteen patients that have a lower ED and higher density than is typical. We can identify these thirteen patients (shown in the lower left portion of Figure 3) as having particularly reduced vowel space, meaning that their vowels are all more acoustically similar to each other. Speakers with a reduced vowel space were visually represented in Figure 1, which showed different vowels clustering toward the center of the acoustic vowel space. When the logistic regression is re-run without them, neither ED (p=0.05) nor density (p=0.58) is a significant predictor of patient/control status.
Correlations with Clinical Research Rating Scales
The SANS and CAINS items that potentially evaluate measurable phonetic characteristics are SANS items 7 (Lack of Vocal Inflections) and 8 (Global Rating of Affective Flattening) and CAINS item 11 (Vocal Expression). Additionally, a general score of each relevant section on the two scales was calculated by averaging the scores within the relevant sections (SANS items 1–8: Affective Flattening or Blunting, and CAINS items 10–13 under section IV: Expression).
Bivariate correlations between the four phonetic measures taken (ED, density, F0 SD for each vowel, and F0 SD every 10ms) and the SANS and CAINS scores were examined. No significant correlations were observed.
Characterization of the 13 Patients with Reduced Vowel Space
We conducted exploratory analyses in an effort to characterize the group of 13 patients with reduced vowel space. Specifically, we compared those with (n=13) and without (n=57) reduced vowel space using bivariate tests (t-tests and chi-squared tests) along a number of demographic and clinical variables. The 13 participants with reduced vowel space were less likely to be African American (7, 53.8% v. 46, 80.7%; χ2=4.15; df=1; p=0.04) and more likely to have been in special classes for learning or behavioral problems during the school years (8, 61.5% v. 14, 24.6%; χ2=6.72; df=1; p=0.01). In terms of positive symptoms, negative symptoms, and neurocognition, we only report here group differences of greater than a small effect (i.e., d>2). SANS and SAPS subscale scores did not differ significantly between the groups. Across eight neurocognitive domains, the 13 participants with reduced vowel space had better category fluency / animal naming (46.8±11.4 v. 38.7±10.4; t=2.49; df=68; p=0.015; d=.74). Finally, although we only had medication data on 8 of 13 with, and 28 of 57 without reduced vowel space, a t-test comparing olanzapine dose-equivalents based on the defined daily dosage method (Leucht et al., 2016) revealed a significant difference (18.45±8.37 v. 9.97±7.84; t=2.66; df=34; p=0.006; d=1.06); those with reduced vowel space had a median olanzapine dose-equivalent of 19.95 (range 10–30) and those without reduced vowel space had a median olanzapine dose-equivalent of 8.75 (range 2–40).
Discussion
Two measurements of vowel space reduction were found to significantly associate with patient/control status. This was largely due to a cluster of thirteen patients who had low Euclidean distance and high density measurements. These patients form a subgroup whose vowel space is notably affected. The remaining patients (n=57) did not significantly differ from controls on either vowel space measurement. Our density measure is a newly developed measure of vowel space. Previously, Euclidean distance has been found to be significantly reduced in the speech of patients with depression and PTSD (Scherer et al., 2016), Parkinson’s disease (Skodda et al., 2011), in stuttering individuals (Blomgren et al., 1998), and in people with Down’s Syndrome (Bunton and Leddy, 2011), among others. In general, vowel space has been found to correlate with decreased intelligibility.
Neither vowel space measurement nor either of the pitch variability measurements were found to correlate with clinicians’ ratings of aprosody on either the SANS or CAINS. Beyond the phonetic factors of amplitude and fundamental frequency, researchers’ reports of “flat affect” and related symptoms may include other features of speech such as enunciation, intelligibility, or even nonverbal features of communication including expressiveness of the face and body language. Our vowel space measure appears to be detecting something that is not detected by clinical raters using standard instruments to measure negative symptoms. While ED and vowel density are both measures of vowel space, ED collapses over the two dimensions (F1 and F2) and density does not. Two speakers can have very similar EDs but quite different densities if, for example, one speaker has a very large F2 range and the other has a moderate F1 and F2 range. The finding that ED and density did not correlate in controls indicates that they are capturing different aspects of the vowel space size. For the subset of 13 patients, we observed that both measures are affected presumably because once you have a notably high proportion of your vowels within one standard deviation of the mean (i.e., a large density measurement), your overall vowel range (i.e., ED) must be quite reduced.
In characterizing the 13 participants with reduced vowel space, they were less likely to be African American, more likely to have been in special classes for learning or behavioral problems, and had better category fluency. Interpretations of these findings are unclear; they need to be replicated in other samples before conclusions can be drawn. If we were able to access the special education notes for these participants, it would be interesting to see if speech and language disturbances were noted in childhood.
While reduced vowel space only affected a subset of patients, this group stands out for articulating their vowels with less distinction than is typical. This could lead to listeners more frequently misunderstanding, or not understanding their speech. Additionally, the fact that we do not see any overlap in the degree of vowel space reduction between the remaining patients and controls suggests that those patients who are affected can be clearly identified as such. Although our data were limited, characterization of the two groups indicated that, among those for whom medication dosages were available, vowel space reduction was associated with a higher antipsychotic dosage: a median olanzapine dose-equivalent of approximately 20mg (with 20mg being at the upper end of recommended dosing), compared to approximately 9mg among those without reduced vowel space. This could indicate that vowel space reduction is a physiologic side effect of the medication. Prior reports indicate that antipsychotics can reduce intelligibility of speech (Sinha et al., 2015a; Sinha et al., 2015b). de Boer and colleagues (2020) recently reported that, among 41 patients with schizophrenia, more severe negative language disturbances (including slower articulation rate, increased pausing, and shorter utterances) were seen in the patients on high dopamine D2 receptor occupancy antipsychotics, while less prominent disturbances were seen in those on low D2 receptor occupancy antipsychotics. As they also note, the negative impact of high D2 receptor occupancy drugs in this area may be clinically meaningful, as impaired language production predicts functional outcome and degrades the quality of life (de Boer et al., 2020).
We acknowledge several methodological limitations. First, longitudinal studies would be required to determine the stability or reproducibility of findings and associations with both symptom severity and antipsychotic medication dosage. It would be useful to know the onset of psychotic symptoms and the initiation of antipsychotic medications, in relation to observed findings. Second, although our vowel space measures were not related to clinical symptoms, we did not measure some other variables that might be pertinent. For example, there could be associations with functioning, especially if less distinction in articulation causes listeners to have difficulty understanding speech. It would also have been of interest to have measured extrapyramidal side effects; unfortunately, we did not administer the Abnormal Involuntary Movement Scale or other measures of such symptoms.
Ultimately, these results are in need of replication. If others find a subset of patients with vowel space reduction, further characterization is warranted. Then, of course, the causes of such a finding would need to be determined. Such causes could relate to cognitive, affective/motivational, or even structural (involving the anatomy of the vocal tract and articulatory apparatus) abnormalities. Additionally, because vocal features pertain to the motor system, it would be important to consider that system as well, which is known to be compromised in many individuals with schizophrenia.
Acknowledgements:
Research reported in this publication was supported by National Institute of Mental Health grant R21 MH097999 (“Applying Computational Linguistics to Fundamental Components of Schizophrenia”) to the last author. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or National Institute of Mental Health.
Role of the Funding Source
Research reported in this publication was supported by National Institute of Mental Health grant R21 MH097999 (“Applying Computational Linguistics to Fundamental Components of Schizophrenia”) to the last author. The funding source had no role in data analyses, the writing of the manuscript, or the decision to submit it for publication.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of Interest
The authors know of no conflicts of interest pertaining to this research.
References
- Alpert M, Rosenberg SD, Pouget ER, Shaw RJ. Prosody and lexical accuracy in flat affect schizophrenia. Psychiatry Res. 2000;97(2–3):107–118. [DOI] [PubMed] [Google Scholar]
- Alpert M, Shaw RJ, Pouget ER, Lim KO. A comparison of clinical ratings with vocal acoustic measures of flat affect and alogia. J Psychiatr Res. 2002;36(5):347–353. [DOI] [PubMed] [Google Scholar]
- Andreasen NC, 1983. The Scale for the Assessment of Negative Symptoms (SANS) Iowa City. IA: University of Iowa. [Google Scholar]
- Andreasen NC. The Scale for the Assessment of Positive Symptoms (SANS) Iowa City. IA: University of Iowa, 1984. [Google Scholar]
- Andreasen NC, Alpert M, Martz MJ. Acoustic analysis: an objective measure of affective flattening. Arch Gen Psychiatry. 1981;38(3):281–285. [DOI] [PubMed] [Google Scholar]
- Bernardini F, Lunden A, Covington M, et al. Associations of acoustically measured tongue/jaw movements and portion of time speaking with negative symptom severity in patients with schizophrenia in Italy and the United States. Psychiatry Res. 2016;239:253–258. [DOI] [PubMed] [Google Scholar]
- Blanchard JJ, Kring AM, Horan WP, Gur R. Toward the next generation of negative symptom assessments: the collaboration to advance negative symptom assessment in schizophrenia. Schizophr Bull. 2011. ;37(2):291–299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blomgren M, Robb M, Chen Y. A note on vowel centralization in stuttering and nonstuttering individuals. Journal of Speech, Language, and Hearing Research 1998; 41(5):1042–1051. [DOI] [PubMed] [Google Scholar]
- Boersma P, Weenink D. Praat: Doing Phonetics By Computer [Computer Program]. Version 5.4.22. Retrieved 4 April 2017. from: http://www.praat.org. [Google Scholar]
- Bunton K, Leddy M. An evaluation of articulatory working space area in vowel production of adults with Down syndrome. Clinical Linguistics and Phonetics 2011;25:4:321–334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clopper C, Pierrehumbert J. Effects of semantic predictability and regional dialect on vowel space reduction. Journal of the Acoustical Society of America 2008;124:1682–1688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cohen AS, Alpert M, Nienow TM, Dinzeo TJ, Docherty NM. Computerized measurement of negative symptoms in schizophrenia. J Psychiatr Res. 2008;42(10):827–836. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Compton MT, Lunden A, Cleary SD, et al. The aprosody of schizophrenia: computationally derived acoustic phonetic underpinnings of monotone speech. Schizophr Res. 2018;197:392–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Covington MA, Lunden SA, Cristofaro SL, et al. Phonetic measures of reduced tongue movement correlate with negative symptom severity in hospitalized patients with first-episode schizophrenia-spectrum disorders. Schizophr. Res 2012;142(1-3):93–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crosswhite K. Vowel Reduction in Optimality Theory. New York: Routledge, 2001. [Google Scholar]
- de Boer JN, Voppel AE, Brederoo SG, Wijnen FNK, Sommer IEC. Language disturbances in schizophrenia: the relation with antipsychotic medication. npj Schizophr 2020;6, 24. 10.1038/s41537-020-00114-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Forbes C, Blanchard JJ, Bennett M, Horan WP, Kring A, Gur R. Initial development and preliminary validation of a new negative symptom measure: the Clinical Assessment Interview for Negative Symptoms (CAINS). Schizophr. Res. 2010;124(1-3):36–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fourakis M. Tempo, stress, and vowel reduction in American English. Journal of the Acoustical Society of America 1991;90:1816–1827. [DOI] [PubMed] [Google Scholar]
- Halle M. The Sound Pattern of Russian. The Hague: Mouton, 1959. [Google Scholar]
- Jastak S, Wilkinson G. The Wide Range Achievement Test: Manual of Instructions. Wilmington, DE: Jastak Associates, 1984. [Google Scholar]
- Kern RS, Nuechterlein KH, Green MF, et al. The MATRICS Consensus Cognitive Battery, Part 2: co-norming and standardization. Am J Psychiatry. 2008;165:214–220. [DOI] [PubMed] [Google Scholar]
- Kliper R, Portuguese S, Weinshall D. 2015. Prosodic analysis of speech and the underlying mental state. Commun Comput Inf Sci. 2015;604:52–62. [Google Scholar]
- Kliper R, Vaizman Y, Weinshall D, Portuguese S. Evidence for depression and schizophrenia in speech prosody. Third ISCA Workshop on Experimental Linguistics, 2010. [Google Scholar]
- Lehiste I, Popov K. Akustiche Analyse bulgarischer Silbenkerne. Phonetica 1970;21:40–48. [Google Scholar]
- Lennes M. Praat Script. Modified by Dan McCloy, December 2011; Downloaded 14 January 2016. https://depts.washington.edu/phonlab/resources/getDurationPitchFormants.praat. [Google Scholar]
- Leucht S, Samara M, Heres S, Davis JM. Dose equivalents for antipsychotic drugs: The DDD method. Schizophrenia Bulletin 2016;42(Suppl. 1):S90–S94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lieberman P Some effects of semantic and grammatical context on the production and perception of speech. Language and Speech 1963;6:172–187. [Google Scholar]
- Lindblom B. Spectrographic study of vowel reduction. Journal of the Acoustical Society of America 1963;35:1773–1781. [Google Scholar]
- Martínez-Sánchez F, Muela-Martínez JA, Cortés-Soto P, et al. Can the acoustic analysis of expressive prosody discriminate schizophrenia? Span J Psychol. 2015;2:18:E86. [DOI] [PubMed] [Google Scholar]
- McGilloway S. Negative symptoms and speech parameters in schizophrenia. Ph.D. thesis, Queen’s University, Belfast, 1997. [Google Scholar]
- Meaux LT, Mitchell KR, Cohen AS. Blunted vocal affect and expression is not associated with schizophrenia: a computerized acoustic analysis of speech under ambiguous conditions. Compr Psychiatry. 2018;83:84–88. [DOI] [PubMed] [Google Scholar]
- Mertens P. The Prosogram: semi-automatic transcription of prosody based on atonal perception model. Speech Prosody 2004, International Conference. [Google Scholar]
- Nuechterlein KH, Green MF, Kern RS, et al. The MATRICS Consensus Cognitive Battery, Part 1: test selection, reliability, and validity. Am J Psychiatry. 2008;165:203–213. [DOI] [PubMed] [Google Scholar]
- Pierrehumbert J. The phonology and phonetics of English intonation. Ph.D. thesis, Massachusetts Institute of Technology, 1980. [Google Scholar]
- Püschel J, Stassen HH, Bomben G, Scharfetter C, Hell D. Speaking behavior and speech sound characteristics in acute schizophrenia. J Psychiatr Res. 1998;32(2):89–97. [DOI] [PubMed] [Google Scholar]
- Ross ED, Orbelo DM, Cartwright J, et al. Affective-prosodic deficits in schizophrenia: profiles of patients with brain damage and comparison with relation to schizophrenic symptoms. J Neurol Neurosurg Psychiatry. 2001;70(5):597–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rapcan V, D’Arcy S, Yeap S, Afzal N, Thakore J, Reilly RB. Acoustic and temporal analysis of speech: a potential biomarker for schizophrenia. Med Engineering Physics. 2010;32(9):1074–1079. [DOI] [PubMed] [Google Scholar]
- Scherer S, Lucas GM, Gratch J, Rizzo AS, Morency L-P. Self-reported symptoms of depression and PTSD are associated with reduced vowel space in screening interviews. IEEE Transactions on Affective Computing 2016;7(1):59–73. [Google Scholar]
- Shue Y-L. The Voice Source in Speech Production: Data, Analysis and Models. Doctoral dissertation. 2010. University of California, Los Angeles: http://www.seas.ucla.edu/spapl/voicesauce. [Google Scholar]
- Sinha P, Vandana VP, Lewis NV, Jayaram M, Enderby P. Evaluating the effect of risperidone on speech: a cross-sectional study. Asian J. Psychiatr 2015a;15:51–55. [DOI] [PubMed] [Google Scholar]
- Sinha P, Vandana VP, Lewis NV, Jayaram M, Enderby P. Predictors of effect of atypical antipsychotics on speech. Indian J Psychol Med. 2015b;37:429–433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skodda S, Visser W, Schlegel U. Vowel articulation in Parkinson’s disease. Journal of Voice 2011;25(4):467–472. [DOI] [PubMed] [Google Scholar]
- Titze IR. On the relation between subglottal pressure and fundamental frequency in phonation. J Acoustical Soc America. 1989;85(2):901–906. [DOI] [PubMed] [Google Scholar]



