Skip to main content
Philosophical Transactions of the Royal Society B: Biological Sciences logoLink to Philosophical Transactions of the Royal Society B: Biological Sciences
. 2014 Dec 19;369(1658):20130404. doi: 10.1098/rstb.2013.0404

Quantification of rhythm problems in disordered speech: a re-evaluation

Anja Lowit 1,
PMCID: PMC4240971  PMID: 25385782

Abstract

Disordered speech can present with rhythmic problems, impacting on an individual's ability to communicate. Effective treatment relies on the availability of sensitive methods to characterize the problem. Rhythm metrics based on segmental durations originally designed for cross-linguistic research have the potential to provide such information. However, these measures may be associated with problems that impact on their clinical usefulness. This paper aims to address the perceptual validity of cross-linguistic metrics as indicators of rhythmic disorder. Speakers with dysarthria and matched healthy participants performed a range of tasks, including syllable and sentence repetition and a spontaneous monologue. A range of rhythm metrics as well as clinical measures were applied. Results showed that none of the metrics could differentiate disordered from healthy speakers, despite clear perceptual differences, suggesting that factors beyond segment duration impacted on rhythm perception. The investigation also highlighted a number of areas where caution needs to be exercised in the application of rhythm metrics to disordered speech. The paper concludes that the underlying speech impairment leading to the perceptual and acoustic characterization of rhythmic problems needs to be established through detailed analysis of speech characteristics in order to construct effective treatment plans for individuals with speech disorders.

Keywords: rhythm metrics, dysarthria, acoustic, rhythm disorders, ataxia, Parkinson's disease

1. Introduction

People with communication deficits can present with a wide range of speech impairments, including disordered rhythm. Any problem that disturbs the natural flow of speech could essentially lead to deviations in rhythmic structure, such as a stammer, a problem finding the correct word, or a difficulty in producing speech sounds in the correct sequence. However, not all of these are necessarily perceived as disordered rhythm. Instead, such deficits are primarily associated with the changes in speech timing and the poor coordination between articulatory systems experienced by speakers with neurogenic speech disorders.

Neurogenic speech problems are also referred to as motor speech disorders (MSDs), which are defined as ‘a group of speech disorders resulting from disturbances in muscular control—weakness, slowness or incoordination of the speech mechanism—due to damage to the central or peripheral nervous system or both’ [1]. There are a number of different types of MSDs, which are distinguishable by their neuropathology, i.e. the place of lesion in the nervous system, and their symptomatology, i.e. the resulting speech problem. Causes for MSDs range from vascular (stroke) to traumatic (traumatic brain injury), degenerative (multiple sclerosis (MS), Parkinson's disease (PD), motor neurone disease, etc.), neoplastic (tumour) and infectious (e.g. meningitis) problems.

The most common type of MSD is dysarthria, which can affect any combination of speech subsystems, i.e. respiration, phonation, articulation and velopharyngeal control. Currently, seven types of dysarthria are recognized in the literature: flaccid, spastic, ataxic, hyperkinetic, hypokinetic, mixed (flaccid/spastic or spastic/ataxic) and unilateral upper motor neurone dysarthria [2]. The differentiation into types is largely based on the neurological classification of muscle tone and movement disorder: spastic dysarthria is due to excess muscle tone and thus results in strained speech production, whereas flaccid dysarthria is related to a decrease in muscle tone and therefore results in weaker articulation patterns and a reduction in loudness. There are also differences in terms of which subsystems are affected and to what degree, e.g. some dysarthrias impact most on prosodic features such as vocal loudness, voice quality or intonation, whereas others are more detrimental to the articulation of speech sounds. Similarly, some types cause a reduction in speech tempo, whereas others have preserved or even accelerated rate. Irrespective of these variations, any type of dysarthria tends to result in reduced intelligibility and naturalness of speech, impacting on the person's effectiveness to communicate and thus their quality of life.

This paper focuses specifically on hypokinetic and ataxic dysarthria as these are commonly reported to present with speech timing deficits. In addition, they differ significantly in their presentation and thus lend themselves to evaluations of how sensitive speech analysis measures are to performance differences. Hypokinetic dysarthria, which is mostly associated with PD, is characterized by poor breath support resulting in a reduction in utterance length, short rushes of speech and inappropriate pausing behaviour, low speech volume and changes to voice quality, impaired articulation, monotonous intonation and, in some cases, accelerated speech tempo [27]. Ataxic dysarthria, on the other hand, is linked to cerebellar problems, i.e. cerebellar stroke or degenerative diseases such as (spino-) cerebellar ataxia, Friedreich's ataxia (FDA) or MS. The resulting speech disorder is characterized by irregular breakdown articulator movements, inappropriate loudness and pitch excursions, as well as changes in voice quality, slow rate, equalized stress and syllabic timing of speech movements [2,812]. The latter is also referred to as scanning speech [9,13], which in severe cases can result in a syllable by syllable production of speech.

Effective treatment of dysarthria by speech and language therapists depends on accurate characterization of its symptoms. While this continues to be performed primarily by perceptual means in the clinical environment, instrumental methods have also been developed for a number of speech features over the years to aid diagnosis and allow the quantification of treatment outcomes. Any acoustic technique used in the characterization of disordered speech features tends to be based on developments in research on unimpaired populations, and rhythm is no exception. Clinical research in this field has focused primarily on techniques developed for the study of cross-linguistic differences, which have been of interest to phoneticians for some time.

Early characterizations of rhythm in this context were based on perceptual evaluations of speakers and resulted in three categories: stress-, syllable- and mora-timed rhythms [14,15]. English and German were generally considered good representatives of stress timing, French and Spanish of syllable timing and Japanese of mora timing. These classifications centred around the concept of isochrony, or equality of duration. In stress-timed languages, stress groups or feet were perceived as being of equal duration, whereas in syllable-timed languages, syllables were considered to be isochronous.

These early perceptual descriptions of rhythm were soon superseded by acoustic measures which allowed researchers to capture speech segment duration from audio recordings with an accuracy of a fraction of a millisecond. On the basis of these data, researchers realized that some of the perceptual concepts developed around rhythm could not be maintained. In particular, the notion of isochrony was not supported by the durational measures, instead stress groups and syllables tended to vary in length irrespective of whether a language was stress or syllable timed. In addition, the acoustic data suggested that the distinction between stress- and syllable-timed languages was not as clear cut as originally thought, but rather formed a continuum. Nevertheless, the original idea of what defined a rhythm class was maintained and speech timing remained at the forefront of researchers' interest in the attempts to capture rhythmic differences between languages. In particular, vowel duration featured heavily in the quantification of rhythm, although other segmental units have also been employed either in isolation or in combination with the vowel measures.

Table 1 provides an illustration of the main methods that have been developed to capture speech timing on this basis. Some metrics purely look at the proportion of vowels in the acoustic signal (%V [16]), based on the assumption that syllable-timed languages which do not alter vowel length a lot will have a higher proportion of vocalic segments in the signal than stress-timed languages which alternate between long and short or reduced vowels. Other measures focus directly on this variability in vowel length, either employing the standard deviation (ΔV [16]) or coefficient of variation (COV) of vowel duration (VarcoV [17]), or measuring the difference in duration between successive vowel pairs (pairwise variability index, PVI [18] or nPVI-V [19]). As vowel duration is closely tied to speech rate, some of the above measures are normalized for rate (VarcoV and nPVI-V).

Table 1.

Summary of prominent rhythm metrics reported in the cross-linguistic literature. (See the text for explanation of terms.)

measurement unit
vowel consonant syllable stress group
%V [16] ΔC [16] VarcoVC [20] ISI [22]
ΔV [16] VarcoC [17] VI [21]
VarcoV [17] rPVI-C [19]
PVI [18]
nPVI-V [19]

In addition to the vowel measures, some researchers have proposed to investigate consonantal segments. This is based on the notion that languages differ not only in their vowel duration but also in the structure of the remaining syllable components. For example, stress-timed languages tend to be rich in consonant clusters, whereas syllable-timed languages predominantly include simple consonant-vowel combinations [16]. Currently available consonantal measures include the ΔC [16], VarcoC [17] and rPVI-C [19] measures. Note that these consonant measures tend not to be rate normalized, as consonant durations vary less across different speaking rates than vowels.

Finally, a number of metrics have gone beyond vowels or consonants as their unit of measurement and look at the variability of syllable duration (VarcoVC [20], variability index (VI) [21]) or whole stress groups (ISI [22]) to characterize rhythm.

The application of the above metrics in clinical research was based on the fact that some of the differences observed between typical and disordered rhythmic performance appeared to mirror the cross-linguistic distinction between stress- and syllable-timed languages. It thus seemed likely that the measures would be able to identify deviations from normal rhythm and thus act as a diagnostic tool. Furthermore, the fact that cross-linguistic rhythm metrics were able to reflect the continuum between stress and syllable timing suggested their suitability to quantify the extent of deviation from normal rhythmic performance in impaired populations. This feature would be important in terms of judging the severity of the disorder and would allow the metric to function as a therapy outcome measure to indicate potential improvement in performance after treatment.

In the attempt to investigate whether rhythm metrics were indeed valid and reliable tools to capture disorders of speech timing, researchers applied a wide range of the above measures. Of these, the PVI was one of the first to be applied to clinical speech [2325]. Another early attempt involved the application of the ISI to Swedish speakers with dysarthria [26]. These studies demonstrated that the measures could successfully differentiate between groups of disordered participants and matched healthy controls. Encouraged by these results, Liss et al. [20] investigated the n-PVI, as well as ΔV and ΔC and measures of syllable variability (VarcoVC, nPVI-VC and rPVI-VC) with the aim to assess which were most suitable to distinguish healthy controls from speakers with dysarthria, as well as different types of dysarthria from each other. They found that variants of the PVI and Varco metrics were particularly successful in discriminating speakers from each other, but that the focus of the comparison determined which of the measures was optimal, i.e. a metric might be better suited to identify speakers with PD than those with ataxia, and in some cases, a combination of predictor variables was most effective to differentiate speaker groups. It is noteworthy that in a subsequent study by Kim et al. [27], the PVI was not successful in distinguishing different types of dysarthric speakers from each other (no results are reported in relation to healthy controls). The authors state that this might have been partly due to the use of the non-normalized version of the PVI, as opposed to the rate-normalized nPVI-V from the Liss et al. [20] study. However, it could also have been the case that the PVI was unable to capture the particular characteristic that differentiated their speaker groups, similar to Liss et al.'s [20] findings that the best distinguishing metric depended on the type of disorder under investigation.

Kim et al.'s [27] results aside, there are thus a small number of clinical research reports based on group data which confirm the suitability of cross-linguistic metrics for the quantification of type and severity of disordered rhythm. However, before these measures can be fully accepted as valid tools, we need to take a step back and consider whether they can indeed capture the intricacies of rhythmic performance in a disordered population in a clinically useful way. In order for these measures to function as effective diagnostic markers, they need to be able to not only indicate the presence of speech timing changes, but also to characterize their nature. The most significant shortcoming of the research cited above is the fact that none of these studies validated their acoustic results with perceptual measures. While each of the disorders investigated to date (ataxic, hypokinetic, hyperkinetic as well as mixed spastic-flaccid dysarthria) no doubt showed differences in speech timing compared with typical speakers as reflected by the rhythm metrics employed, the question arises to what degree these deviations actually corresponded to the perceptual notion of distorted rhythm. This issue is underlined by the fact that traditionally, only ataxic dysarthria has been associated with rhythmic problems although timing issues in general are at the forefront of most types of MSDs. The only study to date that seems to have considered perceptual ratings in combination with acoustic metrics is Henrich et al. [28]. However, they applied a smaller range of acoustic metrics to their data than Liss et al. [20] and, in addition, only included speakers with ataxic dysarthria. Although their results thus provide an indication that some measures (PVI, ISI) correlated better with perceptual judgements of disordered rhythm than others (%V), they cannot shed light on the question whether any deviation picked up by an acoustic metric corresponds to a perceived disorder of rhythm.

There are further methodological problems associated with the application of metrics to disordered speech beyond the lack of validation alluded to above. These pertain to the choice of speech elicitation task and the measurement conventions used. In relation to the latter, there has been little discussion in the clinical literature of the potential effects of distorted speech data on the ability to identify the acoustic landmarks normally applied for the metrics. As explained above, the majority of acoustic rhythm measures are based on either vowel or consonant duration. This implies that segment boundaries need to be accurately identified for the metric to produce reliable and valid results. Yet, the articulatory deviations associated with disordered speech might affect this level of analysis. For examples, many speakers with MSD show a certain level of hypernasality, which can blur the boundaries between nasal consonants and vowels. Similarly, poor laryngeal control can result in the devoicing of vowels, potentially leading researchers to inappropriately attribute parts of the waveform to consonant duration rather than the vowel interval. No study to date has addressed the impact of articulatory deviations in disordered speech on the results of rhythm metrics, again raising the question whether the observed differences in speech timing can in fact be equated with perceivable rhythm problems or might be artefacts of other articulatory symptoms.

Finally, existing studies can be criticized in relation to their lack of variety of elicitation measures. It is by now well accepted that the phonotactic nature of the speech material can have substantial effects on rhythm in healthy speakers [2931]. These effects can be exacerbated in clinical research, i.e. not only do disordered speakers vary in their rhythmic performance across tasks, the extent of difference to typical populations can also change. This is clearly highlighted by Henrich et al. [28], whose speakers with ataxic dysarthria performed within normal limits while reading a passage or a limerick, but differed significantly in their PVI values for spontaneous speech. While it is not feasible for investigations to include a variety of dysarthria types, rhythm measures and elicitation tasks, it is clear that the latter needs to be carefully controlled and ideally more research should focus on the differences across tasks in order to establish the optimum assessment task for a disordered speaker and establish the extent of variability that can be expected across different tasks.

This paper aims to address two of the above concerns and investigates in detail how individual speech production characteristics relate to acoustic-based metrics of speech timing in order to establish (i) to what degree they can act as valid diagnostic markers for rhythmic disorder and (ii) whether any methodological issues might have to be taken into account in applying such methods to a clinical population. While the issue of elicitation method is not directly addressed in this study, it is taken into account in the design by including data from a variety of speech tasks.

2. Material and methods

(a). Participants

The detailed analysis required to answer the above questions precluded the use of a large sample. Instead, six participants with speech disorders, as well as six age and gender matched healthy control speakers were selected from a larger pool of speakers from an existing study [32]. This particular study was chosen as it included speakers with a type of dysarthria that had previously been associated with rhythmic deviations, i.e. ataxic and hypokinetic dysarthria [20]. Inclusion criteria for the larger study comprised normal or corrected to normal vision and hearing, normal cognitive skills (as determined by a dementia screening test [33]), sufficient educational level to perform a reading task and being a native speaker of Scottish English. Unimpaired speakers had to present without medical history related to speech or language difficulties. Disordered speakers were diagnosed as presenting either with ataxic or hypokinetic dysarthria of mild to moderate degree (as established by the referring health professional or the experimenter in cases of self-referral) and no history of speech or language problems other than their dysarthria. Participant selection for this study was based on a review of the acoustic speech data available for each speaker as part of the existing analysis (investigating speech rate, pausing and variability of speech performance), as well as a perceptual evaluation by two experienced listeners, indicating clearly perceivable difficulties with speech timing (table 2).

Table 2.

Participant information. AD, ataxic dysarthria; HD, hypokinetic dysarthria; F, female; M, male; FA, Friedreich's ataxia; SCA, spino-cerebellar ataxia; MS, multiple sclerosis; IPD, idiopathic Parkinson's disease; n.a., no intelligibility analysis was conducted for the control speakers as this measure served to indicate severity of dysarthria which was absent by default in the unimpaired speakers.

participant age gender diagnosis intelligibility (%) medication
AD2 38 F FA 58
AD4 58 M SCA 8 47
AD5 44 F MS 72
HD9 54 M IPD 94 Madopar 8 × 50/12.5 mg; Stalevo 8–10 × 0/200/200 mg
HD11 75 M IPD 72 Madopar 3 × 50/12.5 mg Sinemet-Plus 6 × 25/100 mg
HD14 75 F IPD 88 Ropinirole 3 × 8 mg Sinemet 1 × 50/200 mg
dysarthric group mean: 57.3
s.d.: 15.4
three male
three female
control group mean: 56.2
s.d.: 15.7
three male
three female
n.a. n.a. n.a.

(b). Recording procedure

Participants were seen in their own homes, at Strathclyde University or at their local health centre. Recording sessions lasted around 40–60 min and included participant interview, cognitive testing and speech recordings. Audio recordings were taken using a portable wave recorder (Edirol R-09HR) and a head-mounted condenser microphone (AKG C-420) spaced about 4 cm from the speaker's mouth.

(c). Materials, measurement parameters and analysis

The original study tested speakers on two experimental tasks commonly used in clinical speech research to determine the speech timing characteristics, as well as three further speech tasks to evaluate the intelligibility of the participants with dysarthria (sentence repetition, passage reading and a monologue). The current investigation focused on the same experimental tasks to investigate timing and rhythm acoustically and perceptually. In addition, the monologue data were evaluated perceptually in order to relate the experimental findings to a more natural speech performance. All acoustic analyses for this study were performed by the author rather than the experimenter involved in data collection and analysis of the original investigation. Around 10% of the sentence repetition data were re-analysed by a second experimenter. Spearman's ρ correlation analysis between the two sets of measures showed good agreement with rs = 0.895, p < 0.001. The perceptual analyses were conducted by the author and two other listeners experienced in the evaluation of MSDs. There was good agreement between listeners with an intraclass correlation coefficient of r = 0.89, p < 0.01.

(d). Experimental task 1

The speakers performed a sentence repetition task where they produced ‘Tony knew you were lying in bed’ approximately 20 times as regularly as possible at their habitual speech rate. This task was used in the larger study [32] to determine the variability of motor programmes generated by the speakers across the repetitions. Such investigations usually employ kinematic measures of lip and jaw movements (e.g. [34]) and our test sentence was specifically designed to mirror these task characteristics for the purposes of acoustic analysis. It also lent itself well to further rhythmic analysis due to the alternation of short and long vowels typical of English stress timing, hence the decision to base the current analysis on this existing dataset. Sentence repetition is not commonly used to investigate rhythm and performance might differ from natural speech production due to adaptation or habituation effects. However, this task had the advantage that it allowed clearer comparisons between speech performance and rhythm results as several examples of the same utterance were available. It was therefore possible to observe how small changes in articulatory behaviour might impact on the results of the rhythm metrics. This was particularly important given the small number of participants investigated in this study. It was ensured that there were no differences in regularity of repetitions between disordered speakers and healthy controls based on the results of the original investigation, and the task was therefore considered appropriate for this study.

Measurement parameters for this dataset were based on Liss et al.'s [20] investigation and included the ΔV, ΔC, %V, nPVI-V, rPVI-C, VarcoV, VarcoC, rPVI-VC and nPVI-VC. Given that Liss et al. [20] had not validated their results perceptually, the full range of metrics was employed in order to establish whether any one of these might be more representative of the disordered performance than others. Calculations of these metrics were performed for five consecutive utterances for each participant, taken from the middle of the repetition sequence. Vowel and consonant intervals were labelled by hand on the spectrographic signal in Praat (v. 5.1.32, [35], see figure 1 for a screenshot of a typical analysis window). The final consonant in the utterance (/d/ in ‘bed’) was excluded from analysis as the release burst was not always visible on the spectrogram. Measurement conventions followed those prescribed for the nPVI-V [19], i.e. adjacent consonants or vowels were labelled as one single C or V interval, and syllabic consonants were labelled as vowels. Once segment boundaries were in place, the interval durations were extracted with a customized Praat script. These were subsequently entered into an Excel spreadsheet available from Liss et al. [20] which automatically calculates the various rhythm measures applied in their (and the current) study.1 However, there was one change in procedure from published conventions which related to the method for deriving the rhythm score for each participant. Normally, rhythm measures would be based on a connected speech sample, e.g. a person reading a short passage or producing a monologue. In this case, individual sentences would not be separated for analysis, and CV intervals across utterance boundaries would be treated in the same way as those within sentences, e.g. the durational difference between /m/ and /ai/ would be calculated in the same way in ‘ … him. I … ’ as in ‘ … my … ’. This ensures that utterance final lengthening is considered as part of the rhythm measure. This convention was not observed in this study, because the repetitive nature of the task led to considerable variation between speakers in terms of how long the pause would be between repetitions, and thus also of how much they slowed down towards the end of a sentence. In order to exclude the effects of this inter-speaker variability, the rhythm score was calculated separately for each sentence and then averaged to arrive at a single result. This method furthermore allowed the researcher to investigate the impact of different articulatory patterns on rhythm metrics across the repetitions.

Figure 1.

Figure 1.

Screenshot of a Praat window showing the analysis tiers for the rhythmic analysis. The first window shows the oscillographic signal, and the second window the spectrogram with overlaid F0 (blue, bottom line) and intensity contours (yellow, top line). Underneath are the tiers defined for this particular analysis. The first was a comments tier which was only completed where necessary. The next tier marks the pauses (p) between utterances, this information was used to calculate the articulation rate. The third tier marks the vowel (v) and consonant (c) boundaries within the signal, which formed the basis of the calculation of the rhythm metrics. The final tier adds an orthographic transcription to these intervals for reference purposes. (Online version in colour.)

In addition to the rhythm metrics, articulation rate was measured in syllables per second by dividing the number of syllables produced in each utterance by its duration, excluding any intra-utterance pauses. Furthermore, the number of syllables produced by each participant was noted.

The perceptual analysis consisted of listeners rating the normality of rhythm across all repetitions, thus arriving at a single score between 1 (normal rhythm) and 5 (severely disordered rhythm) for each speaker.

(e). Experimental task 2

Motor speech deficits are often assessed with a diadochokinetic (DDK) task where speakers are asked to repeat single syllables, most commonly ‘pa, ta and ka’, as fast as possible for around 5 s. This task requires the patient to produce speech with an unfamiliar timing structure (isochronous syllable lengths), as well as at a faster than normal rate. Owing to this increased complexity, the task has the potential to highlight difficulties in timing control which might not yet be obvious in more natural speech tasks. Traditionally, the focus of analysis, whether in perceptual or acoustic research, is the rate of repetition, although some researchers have also reported on variability [36]. This task was included in the current investigation to assess whether common methodological issues exist across two very distinct tasks and measurement approaches.

To assess the regularity of production in the DDK task, the same listeners as for task 1 scored this feature on a 5-point scale, with one signalling normal and five highly irregular production. This was done separately for each of the three syllables types. Acoustically, regularity was quantified by measuring the COV of syllable duration. The COV was preferred over the standard deviation as a variability measure as it normalizes for differences in the mean, which was highly likely in the current participant group given that speakers with MSDs frequently show reductions in DDK rate. The acoustic analysis was also conducted separately for each syllable type.

The procedure consisted of hand-labelling syllable duration in Praat based on the spectrographic and oscillographic signal. The measurement interval was defined as the period from one consonant release burst to the next. The initial and final items of the syllable stream were excluded from analysis, to reduce bias from speech initiation difficulties or final lengthening effects.

(f). Monologue

The monologue task was evaluated perceptually by the author, focusing in particular on whether the segmental speech characteristics observed in the sentence repetition task were reflected in spontaneous speech.

(g). Statistical analysis

Given the small sample size and the fact that some of the participants had a speech disorder, non-parametric statistical tests were applied to the data. To perform three-way group comparisons (control, ataxic and hypokinetic dysarthria), the Kruskal–Wallis test was applied, with Mann–Whitney U-test used for post hoc analysis. In line with Nakagawa [37] and Perneger [38], it was decided not to conduct a Bonferroni correction given the exploratory nature of this investigation which necessitated the inclusion of a large number of variables. Instead, statistical results were cross checked with individual speaker performance and greater caution was exercised when interpreting positive statistical results to ensure any differences identified by the analysis were meaningful.

3. Results

(a). Task 1: sentence repetition

Table 3 summarizes the results for the various rhythm measures, as well as articulation rate, syllable count and the perceptual analysis of the speaker's rhythmic performance for the three groups for task 1. The statistical analysis demonstrates clear perceptual differences between the disordered speakers and the control group (table 4). Post hoc analyses indicated significant differences between the ataxia and control (p = 0.024) as well as the hypokinetic and control speakers (p = 0.024), but not the two disordered groups (p = 0.072). By contrast, none of the rhythm metrics yielded any significant group differences despite the fact that the group means for both dysarthric groups tended to fall more towards the syllable-timed end of the spectrum (e.g. higher values for %V, lower values for nPVI-V, nPVI-VC or VarcoV; table 3). It is noteworthy that measures differed from each other in terms of how they captured group performance. For example, the hypokinetic group displayed considerably higher variability than the other two groups for nPVI-V, nPVI-VC and VarcoV, suggesting that the lack of significant differences might have been due to within-group variability. However, this explanation does not apply to all measures equally; for the other metrics, standard deviation values for the hypokinetic group are comparable to or even below those of the other two groups.

Table 3.

Descriptive statistics for each of the measurement parameters split by participant group.

measure control
ataxic dysarthria
hypokinetic dysarthria
mean s.d. range mean s.d. range mean s.d. range
ΔV 0.08 0.02 0.05 0.09 0.04 0.07 0.07 0.01 0.02
ΔC 0.05 0.02 0.04 0.06 0.03 0.06 0.05 0.01 0.01
%V 53.58 1.11 2.47 55.40 2.15 3.87 57.39 2.37 4.38
nPVI-V 73.45 3.63 9.84 71.63 4.27 7.92 58.47 22.41 42.47
rPVI-C 0.05 0.02 0.05 0.05 0.03 0.06 0.04 0.01 0.02
VarcoV 56.47 5.76 17.47 51.78 3.68 7.29 49.23 15.30 30.19
VarcoC 48.86 12.02 29.86 44.60 16.34 32.46 45.68 10.54 20.21
rPVI-VC 0.14 0.03 0.08 0.16 0.08 0.15 0.13 0.02 0.04
nPVI-VC 60.97 6.42 17.65 51.24 3.16 5.70 50.64 12.00 22.43
artic. rate 4.30 0.77 1.96 3.57 1.12 2.23 3.91 0.94 1.83
syll. no. 8.03 0.08 0.20 7.20 0.53 1.00 6.60 0.87 1.60
percep. rating 1 0 0 3.53 0.87 1.70 2.60 0.17 0.30

Table 4.

Group differences (control versus ataxic versus hypokinetic dysarthria) across rhythm metrics and other performance measures using the Kruskal–Wallis test. Significant results are marked in italic.

measure p-value measure p-value
ΔV 0.645 VarcoC 0.944
ΔC 0.920 rPVI-VC 0.676
%V 0.094 nPVI-VC 0.212
nPVI-V 0.655 articulation rate 0.499
rPVI-C 0.929 no. of syllables 0.010
VarcoV 0.437 perceptual rating 0.005

The only other significant result was yielded by the syllable count variable, with lower values for speakers with dysarthria, indicating that they omitted syllables inappropriately.

This latter result suggested significant differences in articulatory performance in the dysarthric group, and this was therefore investigated further to assess whether particular speech characteristics might have impacted on the results of the rhythm metrics. The analysis also served to examine the second research question: whether methodological issues might have affected the results. A number of interesting issues were identified, which are illustrated in figure 2. The figure represents a time-aligned map of segment durations (i.e. overall sentence durations were equalized while maintaining the timing relationships of individual segments) for one control and three disordered speakers, based on the Praat labelling of consonantal and vocalic intervals performed for the metric analysis (cf. figure 1). In addition, the speaker's nPVI-V score is listed. Given the large number of metrics investigated in this study, it was not possible to represent all results in the figure. As none of the metrics performed better than others in terms of differentiating disordered speakers from healthy controls, the choice was based on the fact that the nPVI-V is the most frequently reported measure in clinical research to date, and results can thus be more easily related to previous studies.

Figure 2.

Figure 2.

Time-aligned map showing consonant and vowel interval durations for ‘Tony knew you were lying in bed’ for a control (a), two ataxic (b,c) and one PD speaker (d), as well as nPVI-V values for these speakers. % = intra-utterance pause; for ease of reference, the productions have been transcribed orthographically. Please note that segments in brackets were not produced by the speakers and the final consonant in ‘bed’ was excluded from analysis. (Online version in colour.)

Figure 2 shows the data of only one control speaker as it was not possible to present mean group performance in this format. The selection of this particular individual was based on the fact that she performed close to the group mean for the nPVI-V and in addition, showed the expected vowel duration pattern, i.e. vowels with strong beats were long (Tony knew you were lying in bed), and the remaining vowels were short. Exceptions to this pattern were the /i/ in ‘Tony’, where the inherent nature of the vowel did not allow for as much reduction as in the other syllables. In addition, the /ɛ/ in ‘bed’ was longer than might be expected due to phrase-final lengthening. All control speakers consistently omitted the second syllable in ‘lying’, which was produced as a single syllable, i.e. Inline graphic.

The comparison of this pattern with the disordered speakers highlights a number of differences. Speakers (b) and (c), who had ataxic dysarthria, clearly produce deviating vowel durations compared with the normal pattern by lengthening unstressed syllables (‘you’ in speaker (b) and ‘Tony’ in speaker (c)), the latter actually leading to a reversal of the normal timing relationship for ‘Tony’. A slightly different version of timing shift can be found in the sequence ‘knew you were’. The two vowels in ‘you’ and ‘were’ in the control speaker were relatively similar, but there was a contrast between ‘knew’ and ‘you’, leading to a ‘long–short–short’ pattern. On the other hand, the lengthening of ‘you’ observed in speaker (b) reduced the contrast between ‘knew’ and ‘you’, but at the same time increased the difference between ‘you’ and ‘were’, resulting in a ‘long–long–short’ pattern. In all of these examples, the degree of variability between successive vowels was thus maintained despite clear deviations from normal speech rhythm.

A different feature that could be expected to affect the rhythm metrics was the deletion (or elision) of segments and syllables. For example, both speaker (b) and (c) elided the unstressed word ‘in’. This resulted in two long vowels appearing next to each other instead of the long–short–long sequence apparent in the control speaker. Speaker (c) shows another example of this with the elision of ‘were’, i.e. ‘you were’ is inappropriately reduced to the single syllable ‘you're’.

These features were not restricted to the sentence repetition task but were also underlined in the speakers' naturalistic speech data. Figure 3a,b presents further examples of segment or syllable deletion: the word ‘prefer’ loses the ‘re’ in the first syllable, creating a new ‘pf’ consonant cluster and reducing the word to a single syllable. This had the added effect of reducing the variability between successive vowels: the long–short–long patterns of ‘I prefer’ is reduced to two adjacent long vowels. The reduction of ‘accommodation’ to ‘komdeish’ is an even more severe example of this process, effectively deleting three of the five syllables in the word and again only maintaining the two long vowels.

Figure 3.

Figure 3.

Examples of articulatory deviations. (a) ‘Swimming with dolphins’—the speaker is equalizing the length of the vowels. (b) ‘I prefer’—the speaker has deleted most of the segments in the first syllable, leaving only the first consonant /p/. (c) ‘Accommodation’—deletion of first, third and fifth syllable, resulting in an abnormal production as ‘komdeish’. (d) ‘Tony knew you were lying in bed’—the speaker is fusing most of the utterance into one single word with no recognizable consonants until the /n/ of ‘in’, resulting in an abnormally long vowel. (Online version in colour.)

Figure 3c, on the other hand, is a further example of the inversion of expected vowel duration. The sentence ‘swimming with dolphins' has stress on the first syllable in the first and final word, with the expectation that the vowels in these syllables should be slightly longer than those in the unstressed parts of the word. However, in this speaker, the stressed vowels are between 86 and 100 ms long, whereas the unstressed ones last between 112 and 133 ms. This results in the perceptual impression that the person is stressing the wrong syllable (i.e. ‘-ming’ and ‘-phins').

The final speaker in figures 2 and 3 demonstrates a very different speech characteristic which clearly impacted on the rhythm metric, resulting in a considerably higher nPVI-V result than for the other speakers. Speaker (d) had PD rather than ataxia, and presented with problems with segmental production, which resulted in her omitting all consonants and merging five separate syllables into one continuous vowel. This is a common problem in speakers with PD whose reduced speed and range of movement can cause stops and fricatives being replaced by approximants, or in severe cases being completely elided with only vowels remaining, as evidenced in the first part of speaker (d)'s utterance. Although the number of syllables could be identified perceptually through the change in vowel quality, the methodology for the acoustic rhythm metrics prescribes that consecutive vowel or consonant intervals should be labelled as one unit, thus resulting in an excessively long vocalic period for this speaker. The resulting difference in length to the neighbouring syllable leads to the high nPVI-V result. The impact of this segmentation rule becomes apparent when it is ignored and the different vowels are separated, in which case the speaker's nPVI-V value drops to 66, i.e. below rather than above the control mean. It should be noted that this method was not without its problems either though, as the separation of the vowel into distinct segments introduced an element of unreliability given the poor acoustic landmarks available to identify the boundaries between different vowels.

(b). Task 2: syllable repetition

Table 5 summarizes the results for the analysis of the DDK tasks, indicating the perceptual rating, the variability measure (COV) and the rate of articulation. In addition, the means for all three syllable types are indicated, as these were pooled for the purpose of reducing the number of comparisons for the statistical analysis (this was deemed appropriate as they essentially represented the same speech task and no particular syllable stood out as eliciting specific behaviours that could not be observed in the others).

Table 5.

Results for rate, variability and perceptual evaluation for the three DDK tasks.

measure syllable control
ataxic dysarthria
hypokinetic dysarthria
mean s.d. range mean s.d. range mean s.d. range
percept. rating /pa/ 1 0 0 3.10 0.36 0.70 3.83 0.29 0.50
/ta/ 1 0 0 3.87 0.81 1.50 3.43 0.12 0.20
/ka/ 1 0 0 4.03 0.25 0.50 3.70 1.28 2.50
mean 1 0 0 3.67 0.48 0.90 3.66 0.56 1.07
COV /pa/ 5.76 0.78 2.33 6.91 3.22 5.85 16.10 5.35 9.97
/ta/ 7.24 1.82 4.31 10.48 7.97 13.90 10.89 3.20 5.64
/ka/ 7.28 1.28 3.15 8.06 3.01 5.91 13.85 7.30 13.16
mean 6.76 1.29 3.26 8.49 4.73 8.55 13.61 5.28 9.59
rate /pa/ 6.84 0.61 1.79 3.83 1.13 2.24 5.45 0.36 0.63
/ta/ 6.66 0.49 1.51 3.62 1.20 2.24 6.52 0.87 1.67
/ka/ 6.16 0.59 1.66 3.19 1.21 2.09 5.64 0.68 1.31
mean 6.55 0.56 1.65 3.54 1.18 2.19 5.87 0.64 1.20

Despite the elevated group means suggesting more variable behaviour in the dysarthria speakers, the results of the Kruskal–Wallis test did not indicate any significant difference for the variability measure (COV, p = 0.101). However, the perceptual evaluation and articulation rate separated the control speakers from the dysarthric groups (p = 0.009 for both variables, post hoc analyses showed significant results for comparisons between the control and either of the dysarthria groups (p = 0.024)). Although the hypokinetic participants showed a considerably different mean rate to the ataxic speakers, the post hoc analysis only just confirmed this (p = 0.05).

Following the renewed mismatch between the perceptual evaluation and speech timing metric for this task, further qualitative evaluation of the acoustic data was performed again, paying attention to clarity of syllable production, as well as intensity and F0 variability between successive syllables. Figure 4 presents some examples of the kinds of issues this analysis highlighted. The first speaker (1) is a control participant, demonstrating relatively regular durations, intensity peaks and F0 levels, with clear separation of syllables. In comparison, speaker (2), who had ataxic dysarthria, shows a lot more variability in her F0 performance and speaker (3), a participant with hypokinetic dysarthria, produced highly variable intensity peaks. Finally, speaker (4) shows behaviour typical for PD in that he accelerated towards the end of the utterance despite having been quite regular initially. In addition, he also presented with reduced clarity of syllable production, particularly during the acceleration period, which is shown by the less defined gaps between syllables.

Figure 4.

Figure 4.

Praat screenshots demonstrating performance characteristics for the repetition of /pa/. (1): Control speaker; (2) and (3): speakers with ataxic dysarthria; (4): speaker with hypokinetic dysarthria. The upper window represents the waveform, the lower window shows the spectrogram with intensity (yellow peaks) and F0 levels (blue lines) superimposed. Time is displayed on the x-axis, the y-axis indicates frequency and intensity levels. The darker shades in the spectrogram represent speech, i.e. the syllable /pa/, the lighter or blank areas show the pauses between the syllables (including the closure phase for the consonant /p/). (Online version in colour.)

Although each of the disordered speakers displayed a degree of durational variability in addition to the above features (see speaker (3) in particular), this was not sufficient to take them into the abnormal range, as control speakers were not completely regular in their repetitions either (note for example the shorter third syllable for speaker (1)). It could therefore be hypothesized that the listeners based their judgements not only on durational regularity, but might also have been influenced by other factors such as those described above, thus leading to the mismatch between perceptual results and the COV measure.

4. Discussion

This paper aimed to explore the degree to which acoustic rhythm metrics in their current form are able to reflect the nature of impairment in people with MSDs and whether their results might be affected by measurement conventions, in an attempt to better understand how applicable existing rhythm metrics are to the analysis of disordered speech. The results showed that there was poor relationship between the acoustic and perceptual measures in that none of the metrics applied captured the differences between control and disordered speakers that had been identified perceptually by the listeners. There was also some suggestions that in at least some cases, certain dysarthric speech symptoms such as inappropriate duration, segmental articulation errors or changes in intensity and F0 modulation either influenced the results of the metrics directly or affected listener perception of rhythm, thus leading to the mismatch between the two types of analyses. These findings have implications for the use of acoustic-based metrics to characterize speech performance in disordered populations, suggesting that care needs to be taken in the interpretation of these results and additional analysis methods might need to be employed to arrive at a valid characterization of a speaker's performance.

The lack of differentiation between groups by the rhythm metrics in task 1 was unexpected, given that speakers had been selected on the basis that they perceptually presented with rhythmic deviance. The current findings contradict earlier studies which demonstrated good sensitivity of the investigated rhythm metrics to different types of speech impairments [20] as well as significant correlations between at least one of the measures applied here (nPVI-V) and perceptual ratings of disordered speakers [28]. On the other hand, they confirm Kim et al.'s [27] findings, as these authors also failed to differentiate disordered from healthy speakers with their PVI measure. Small sample size and intra-group variability are frequently cited as reasons for lack of statistical significance, both of which can be said to apply in this study. There is certainly a possibility that a larger sample group would have resulted in more affirmative group differences for the rhythm metrics. In addition, one could argue that the highly repetitive nature of the task might have influenced results in some way, leading to a loss of distinction between speakers groups. The results reported by Henrich et al. [28] could support this conclusion as they also observed that groups differed from each other in some but not other tasks. Irrespective of these explanations, the issue remains that there was a considerable mismatch between the acoustic and perceptual analysis. More importantly, this study identified a number of issues that appeared to affect the way rhythmic deviations were captured by the metrics. These would apply even in cases where these metrics did highlight differences to healthy speakers, and it is thus important to consider them in future research as well as clinical practice.

Two patterns in particular were identified in the sentence repetition as well as the spontaneous speech task: changes to vowel duration and segment deletion. As already described in the Results section, they both caused changes to the normal speech timing pattern, however, they could not be captured equally well by the metrics. The effects of segment deletion, resulting in the proliferation of stressed syllables and thus less durational differences between successive vowels, should become apparent in the rhythm metrics. On the other hand, the observed changes to vowel changes can have a more serious effect on rhythm measures, particularly if the speaker is so severely impaired that the normal timing relationships are reversed. Rhythm metrics solely focus on the duration of vowels, irrespective of whether their relative timing is correct or not, hence they might yield results within the normal range even in cases where speakers are producing highly inappropriate patterns. It should be noted that none of the speakers produced exclusively one or the other of the described speech deviances, and parts of the utterance were always produced correctly. This could be another contributing reason why the rhythm metrics did not show the expected group differences as the effects of the various types of speech deviations on the metrics cancelled each other out.

While there are currently no other published clinical data available to my knowledge that report on mismatches between acoustic and perceptual results, there are some parallels from findings in cross-linguistic investigations of rhythm. Arvaniti [39] observed that different types of accented English were classified into the same rhythm category despite diverse speech patterns, e.g. English materials spoken by native Korean and Spanish speakers were both classified as syllable timed according to the rhythm metric results, but this was attributable to phrase-final lengthening by Korean speakers as opposed to lenition of intervocalic consonants by Spanish participants.

Both the current and Arvaniti's results thus highlight the possibility that metrics might not pick up on rhythmic differences that are perceptually evident. However, they also suggest that this problem could be solved by crosschecking results with an additional articulatory analysis of the participants' speech. This conclusion does not only impact on research methods but also has significant implications for effective clinical practice. For example, it has already been discussed that syllable deletion could make the rhythm metric suggest a more syllable-timed rhythm. This result sits well with previous research, particularly into ataxic dysarthria, which is characterized as displaying equalized syllable durations [2,20,28]. However, this is often assumed to be due to a lengthening of unstressed syllables and not syllable deletion. The correct identification of the underlying reason for the observed acoustic and even perceptual findings is very important for clinical decision-making, as the treatment strategies for addressing inappropriate vowel length are distinct from those maximizing articulatory accuracy and wrong assumptions made about the nature of the problem could lead to ineffective treatment of the disorder.

These data furthermore highlight that it is important to consider the overall speech context in the detailed evaluation of the data rather than particular errors in isolation. As indicated in figure 2, both speaker (b) and speaker (c) lengthened unstressed syllables (e.g. ‘you’, speaker (b), or ‘Tony’, speaker (c)). However, there is an important distinction between the patterns produced by these two speakers. Speaker (c) inappropriately lengthened a weak syllable, giving ‘To-’ and ‘-ny’ equal emphasis (cf. ‘swimming’ in figure 3a), and thus presented with the syllable-timed speech commonly associated with ataxic dysarthria. On the other hand, the changes in speaker (b) could at least partly be attributed to phrase-final lengthening as she inserted a pause after ‘you’ (figure 2). Again different treatment approaches would be warranted depending on the reason for the unnatural lengthening of the unstressed syllable. That is, treatment might focus on phrasing and pause placement in speaker (b) to reduce the amount of intrusive phrase-final lengthening, whereas speaker (c) might benefit more from exercises aimed at producing appropriate distinctions between stressed and unstressed syllables.

Further problems specific to clinical data were highlighted by the speech patterns of speaker (d) (figure 2) who fused several words into one long vowel. These data highlight that segmentation issues have to be considered carefully when applying rhythm metrics to disordered populations: the choice of data labelling method impacted on whether this speaker was described as a producing a higher or lower degree of variability between successive syllables. It could even be argued that the application of a metric was not appropriate in her case. While she represented an extreme example of how disordered speech features might require methodological adjustments in order for rhythm metrics to remain valid reflections of speech timing, there might have been other more subtle issues that also had an impact on the acoustic results, potentially providing further explanations for the lack of group differences observed for the rhythm metrics. Again a more detailed analysis of the speech data in addition to acoustic metrics and perceptual analysis might help to highlight any methodological issues arising from the particular rhythmic features of the speaker.

Similar to the sentence repetition task, the DDK data also showed less group differences than had been anticipated on the basis of previous research. Irregular syllable repetitions in DDK tasks are frequently quoted as strong diagnostic markers of speech disturbances in speakers with dysarthria. Yet again, the current group of participants did not show the expected increases in durational variability indicated by the results of the perceptual analysis. More detailed investigation of these data revealed two possible methodological issues. A number of speakers presented with reduced clarity of syllable production, particularly irregular release bursts for the stop consonants and less clear boundaries between syllables, which could have produced the perceptual impression of irregular rhythm. In addition, some speakers showed differences in their intensity and F0 performance, which was more irregular than in the control participants. It is thus possible that listeners perceived rhythmic disturbances not because of irregularities in timing between successive syllables per se, but due to other compounding speech disturbances such as inconsistencies in intensity and F0 production, and reduced articulatory accuracy.

The concept of intensity and F0 influencing the perception of rhythmic disorder is interesting in that it reflects similar discussions in the cross-linguistic arena. A number of researchers now argue that defining rhythm in terms of timing is a somewhat flawed concept, and that instead, other features such as F0 or intensity patterns, as well as speech rate need to be taken into account [31,40,41]. Arvaniti [31] furthermore calls for a reconsideration of Dauer's [42] arguments to view rhythm as a function of stress placement. Stress is based on prominence (realized through changes in duration, F0 and intensity) rather than only temporal patterns. This suggestion sits well with the current data, as stress production is a prominent area of disturbance in speakers with dysarthria, and recent research into disordered intonation has highlighted that these speakers tend to produce shorter phrases and overaccentuate, i.e. place more pitch accents into utterances, than healthy controls [43,44]. In addition, Lowit et al. [45] have already demonstrated a link between intonational and rhythmic disturbance in an exploratory study involving speakers with ataxia. In view of the evidence presented by research into unimpaired as well as disordered speech, there is thus an argument to investigate rhythm beyond the confines of speech timing features.

In summary, the results of the current investigation lend further support to the need for a multi-layered approach to characterizing rhythmic performance in a clinical context. This means that it is not sufficient to only capture timing characteristics without considering how these tie in with intensity and F0 production to create the rhythmic patterns of the observed speech sample. In addition, these data suggest that measurement conventions developed with unimpaired speech data might need to be evaluated carefully to determine their appropriateness for the analysis of disordered populations. Finally, the results highlight the value of a detailed analysis of segmental speech performance and the context in which they occur, in order to (i) validate the results of the perceptual and acoustic analyses, and (ii) arrive at appropriate conclusions of why speakers are identified as presenting with disordered rhythm, which will inform future research studies and aid clinicians in their decision-making.

While not under direct investigation in this study, the results raise some further questions that could be the focus of future research. First, the effects of abnormally short utterance lengths and the associated increase in phrase-final lengthening on rhythm metrics are one area that might benefit from further investigation, particularly as shorter utterances are prolific among most types of MSD. Second, although many rhythm metrics are normalized for rate, the effects of rate variation on rhythm would also benefit from further investigation, to better understand which abnormalities observed in disordered speakers are due to actual motor control deficits, and which can be attributed to reductions in articulation rate. Finally, this study did identify group differences in relation to the number of syllables produced, which presented the only other variable besides the perceptual ratings to yield statistically significant results. These measures clearly tie in with the observed feature of syllable deletion. While it is by no means suggested that this variable could replace rhythm metrics to capture speech timing characteristics, it would be interesting to investigate its clinical diagnostic value, as it is a relatively simple and fast measure to perform and does not required specialized software or analysis skills, unlike the rhythm metrics.

5. Conclusion

In conclusion, the field of rhythm research still has far to go in terms of establishing a valid and reliable suite of measurement tools that can capture the intricate differences between languages or between disordered and unimpaired speakers. Although each of the findings discussed above needs to be interpreted with caution due to the small number of participants as well as listeners included in the experiments, this detailed investigation has highlighted a number of issues that question the validity of existing quantitative approaches to the study of disordered rhythm to at least some degree and stresses the importance of more detailed analysis than is common in both research and clinical practice to ensure the correct conclusions are drawn and appropriate clinical decisions are made.

In particular, any future tool needs to look beyond timing as the only measurement parameter. In addition, more work is necessary to better understand performance variations, particularly those caused by speaker individuality and the structure of the speech materials. Given the similarities in the concerns that have been raised about current methods in both the theoretical and applied field, it appears sensible to ensure that future advances in either field of research take cognizance of the issues raised in the other.

Acknowledgements

I extend my thanks to all participants in this research and to Frits van Brenk for providing some of his data for further analysis.

Endnote

1

Further information on the formulae and procedure applied for each measure can be found in the original source document referenced for the metrics in the Introduction.

Funding statement

The collection of the original data was supported by a Scottish Funding Council Studentship Award.

References

  • 1.Darley FL, Aronson AE, Brown JR. 1969. Differential diagnostic patterns of dysarthria. J. Speech Lang. Hear. Res. 12, 246–269. ( 10.1044/jshr.1202.246) [DOI] [PubMed] [Google Scholar]
  • 2.Duffy JR. 2013. Motor speech disorders—substrates, differential diagnosis, and management, 3rd edn St. Louis, MO: Elsevier, Mosby. [Google Scholar]
  • 3.Skodda S, Visser W, Schlegel U. 2011. Vowel articulation in Parkinson's disease. J. Voice. 25, 467–472. ( 10.1016/j.jvoice.2010.01.009) [DOI] [PubMed] [Google Scholar]
  • 4.Tanaka Y, Nishio M, Niimi S. 2011. Vocal acoustic characteristics of patients with Parkinson's disease. Folia Phoniatr. Logop. 63, 223–230. ( 10.1159/000322059) [DOI] [PubMed] [Google Scholar]
  • 5.Huber JE, Darling M. 2011. Effect of Parkinson's disease on the production of structured and unstructured speaking tasks: respiratory physiologic and linguistic considerations. J. Speech Lang. Hear. Res. 54, 33–46. ( 10.1044/1092-4388(2010/09-0184)) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Martinez-Sanchez F. 2010. Speech and voice disorders in Parkinson's disease. Rev. Neurol. 51, 542–550. [PubMed] [Google Scholar]
  • 7.Blanchet PG, Snyder GJ. 2009. Speech rate deficits in individuals with Parkinson's disease: a review of the literature. J. Med. Speech Lang. Pathol. 17, 1–7. [Google Scholar]
  • 8.Ackermann H, Graber S, Hertrich I, Daum I. 1999. Phonemic vowel length contrasts in cerebellar disorders. Brain Lang. 67, 95–109. ( 10.1006/brln.1998.2044) [DOI] [PubMed] [Google Scholar]
  • 9.Ackermann H, Hertrich I. 1994. Speech rate and rhythm in cerebellar dysarthria: an acoustic analysis of syllabic timing. Folia Phoniatr. Logop. 46, 70–78. ( 10.1159/000266295) [DOI] [PubMed] [Google Scholar]
  • 10.Schalling E, Hammarberg B, Hartelius L. 2008. Perceptual and acoustic analysis of speech in individuals with spinocerebellar ataxia (SCA). Logop. Phoniatr. Vocol. 32, 31–46. ( 10.1080/14015430600789203) [DOI] [PubMed] [Google Scholar]
  • 11.Kent RD, Kent JF, Duffy JR, Thomas JE, Weismer G, Stuntebeck S. 2000. Ataxic dysarthria. J. Speech Lang. Hear. Res. 43, 1275–1289. ( 10.1044/jslhr.4305.1275) [DOI] [PubMed] [Google Scholar]
  • 12.Wang Y, Kent R, Duffy J, Thomas J. 2009. Analysis of diadochokinesis in ataxic dysarthria using the motor speech profile program. Folia Phoniatr. Logop. 61, 1–11. ( 10.1159/000184539) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Darley FL, Aronson AE, Brown JR. 1969. Clusters of deviant speech dimensions in the dysarthrias. J. Speech Hear. Res. 12, 462–496. ( 10.1044/jshr.1203.462) [DOI] [PubMed] [Google Scholar]
  • 14.Pike K. 1945. The intonation of American English. Ann-Arbor, MI: University of Michigan Press. [Google Scholar]
  • 15.Lloyd James A. 1940. Speech signals in telephony. London, UK: Pitman&Sons. [Google Scholar]
  • 16.Ramus F, Nespor M, Mehler J. 1999. Correlates of linguistic rhythm in the speech signal. Cognition 73, 265–292. ( 10.1016/S0010-0277(99)00058-X) [DOI] [PubMed] [Google Scholar]
  • 17.White L, Mattys SL. 2007. Calibrating rhythm: first language and second language studies. J. Phon. 35, 501–522. ( 10.1016/j.wocn.2007.02.003) [DOI] [Google Scholar]
  • 18.Low EL, Grabe E, Nolan F. 2000. Quantitative characterizations of speech rhythm: ‘Syllable-timing’ in Singapore English. Lang. Speech 43, 377–401. ( 10.1177/00238309000430040301) [DOI] [PubMed] [Google Scholar]
  • 19.Grabe E, Low EL. 2002. Durational variability in speech and the rhythm class hypothesis. In Papers in laboratory phonology (eds Warner N, Gussenhoven C.), pp. 515–546. Berlin, Germany: Mouton de Gruyter. [Google Scholar]
  • 20.Liss JM, White L, Mattys SL, Lansford K, Lotto AJ, Spitzer SM, Caviness JN. 2009. Quantifying speech rhythm abnormalities in the dysarthrias. J. Speech Lang. Hear. Res. 52, 1334–1352. ( 10.1044/1092-4388(2009/08-0208)) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Deterding D. 2001. The measurement of rhythm: a comparison of Singapore and British English. J. Phon. 29, 217–230. ( 10.1006/jpho.2001.0138) [DOI] [Google Scholar]
  • 22.Fant G, Kruckenberg A, Nord L. 1991. Durational correlates of stress in Swedish, French and English. J. Phon. 19, 351–365. [Google Scholar]
  • 23.Stuntebeck S. 2002. Acoustic analysis of the prosodic properties of ataxic speech. Madison, WI: University of Wisconsin. [Google Scholar]
  • 24.Wang Y-T, Kent RD, Duffy JR, Thomas JE, Fredericks GV. 2006. Dysarthria following cerebellar mutism secondary to resection of a fourth ventricle medulloblastoma: a case study. J. Med. Speech Lang. Pathol. 14, 109–122. [Google Scholar]
  • 25.Liss J, Spitzer S, Lansford K, Choe Y-K, Kennerley K, Mattys S, White L, Caviness J. 2007. Quantifying speech rhythm deficits in dysarthria. J. Acoust. Soc. Am. 121, 3134–3135. ( 10.1121/1.4782155) [DOI] [Google Scholar]
  • 26.Hartelius L, Runmarker B, Andersen O, Nord L. 2000. Temporal speech characteristics of individuals with multiple sclerosis and ataxic dysarthria: ‘scanning speech’ revisited. Folia Phoniatr. Logop. 52, 228–238. ( 10.1159/000021538) [DOI] [PubMed] [Google Scholar]
  • 27.Kim Y, Kent RD, Weismer G. 2011. An acoustic study of the relationships among neurologic disease, dysarthria type, and severity of dysarthria. J. Speech Lang. Hear. Res. 54, 417–429. ( 10.1044/1092-4388(2010/10-0020)) [DOI] [PubMed] [Google Scholar]
  • 28.Henrich J, Lowit A, Schalling E, Mennen I. 2006. Rhythmic disturbance in ataxic dysarthria: a comparison of different measures and speech tasks. J. Med. Speech Lang. Pathol. 14, 291–296. [Google Scholar]
  • 29.Renwick MEL. 2012. What does %V actually measure? Cornell Working Pap. Phon. Phonol. 3, 1–20. [Google Scholar]
  • 30.Wiget L, White L, Schuppler B, Grenon I, Rauch O, Mattys SL. 2010. How stable are acoustic metrics of contrastive speech rhythm? J. Acoust. Soc. Am. 127, 1559–1569. ( 10.1121/1.3293004) [DOI] [PubMed] [Google Scholar]
  • 31.Arvaniti A. 2012. The usefulness of metrics in the quantification of speech rhythm. J. Phon. 40, 351–373. ( 10.1016/j.wocn.2012.02.003) [DOI] [Google Scholar]
  • 32.van Brenk F, Lowit A. 2012. The relationship between acoustic indices of speech motor control variability and other measures of speech performance in dysarthria. J. Med. Speech Lang. Pathol. 20, 24–29. [Google Scholar]
  • 33.Mioshi E, Dawson K, Mitchell J, Arnold R, Hodges JR. 2006. The Addenbrooke's cognitive examination revised (ACE-R): a brief cognitive test battery for dementia screening. Int. J. Geriatr. Psychiatry 21, 1078–1085. ( 10.1002/gps.1610) [DOI] [PubMed] [Google Scholar]
  • 34.Smith A, Goffman L, Zelaznik HN, Ying G, McGillem C. 1995. Spatiotemporal stability and the patterning of speech movement sequences. Exp. Brain Res. 104, 493–501. ( 10.1007/BF00231983) [DOI] [PubMed] [Google Scholar]
  • 35.Boersma P, Weenink D. 1992–2013. Praat—doing phonetics by computer See www.praat.org.
  • 36.Kent RD, Kim YJ. 2003. Toward an acoustic typology of motor speech disorders. Clin. Linguist. Phon. 17, 427–445. ( 10.1080/0269920031000086248) [DOI] [PubMed] [Google Scholar]
  • 37.Nakagawa S. 2004. A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav. Ecol. 15, 1044–1045. ( 10.1093/beheco/arh107) [DOI] [Google Scholar]
  • 38.Perneger TV. 1998. What's wrong with Bonferroni adjustments. Br. Med. J. 316, 1236–1238. ( 10.1136/bmj.316.7139.1236) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Arvaniti A. 2009. Rhythm, timing and the timing of rhythm. Phonetica 66, 46–63. ( 10.1159/000208930) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nolan F, Jeon H-S. 2014. Speech rhythm: a metaphor? Phil. Trans. R. Soc. B 369, 20130396 ( 10.1098/rstb.2013.0396) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Tilsen S, Arvaniti A. 2013. Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. J. Acoust. Soc. Am. 134, 628–639. ( 10.1121/1.4807565) [DOI] [PubMed] [Google Scholar]
  • 42.Dauer RM. 1987. Phonetic and phonological components of language rhythm. In Proc. XIth Intl Congr. Phonetic Sciences, 1–7 August 1987, Tallinn, vol. 5, pp. 447–449. Tallinn, Estonia: Academy of Sciences of the Estonian S.S.R. [Google Scholar]
  • 43.Lowit A, Kuschmann A. 2012. Characterizing intonation deficit in motor speech disorders: an autosegmental-metrical analysis of spontaneous speech in hypokinetic dysarthria, ataxic dysarthria, and foreign accent syndrome. J. Speech Lang. Hear. Res. 55, 1472–1484. ( 10.1044/1092-4388(2012/11-0263)) [DOI] [PubMed] [Google Scholar]
  • 44.Kuschmann A, Lowit A, Miller N, Mennen I. 2012. Intonation in neurogenic foreign accent syndrome. J. Commun. Disord. 45, 1–11. ( 10.1016/j.jcomdis.2011.10.002) [DOI] [PubMed] [Google Scholar]
  • 45.Lowit A, Kuschmann A, MacLeod JM, Schaeffler F, Mennen I. 2010. Sentence stress in ataxic dysarthria—a perceptual and acoustic study. J. Med. Speech Lang. Pathol. 18, 77–82. [Google Scholar]

Articles from Philosophical Transactions of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society

RESOURCES