Skip to main content
Journal of Speech, Language, and Hearing Research : JSLHR logoLink to Journal of Speech, Language, and Hearing Research : JSLHR
. 2025 Feb 19;68(7 Suppl):3568–3582. doi: 10.1044/2024_JSLHR-24-00381

Assessing Fundamental Frequency Variation in Speakers With Parkinson's Disease: Effects of Tracking Errors

Alena Portnova a, Annalise Fletcher a,, Alan Wisler b, Stephanie A Borrie a
PMCID: PMC12337113  PMID: 39970199

Abstract

Purpose:

Automatic measurements of fundamental frequency (F0) typically contain tracking errors that can be challenging to accurately correct. This study assessed to what degree these errors change F0 summary statistics in speakers with Parkinson's disease (PD) and neurotypical adults. In addition, we include a case study examining how the removal of tracking errors influenced our ability to predict a perceptual outcome measure, speech expressiveness, associated with dysarthria and PD. Several different statistical approaches for characterizing F0 variability were used to demonstrate the influence of tracking errors.

Method:

Eight speakers with PD and eight neurotypical speakers were recorded reading The Caterpillar passage. F0 measurements were extracted in Praat and tracking errors were manually identified. The effect of tracking errors on F0 mean and standard deviation was statistically analyzed. Twenty listeners rated speech expressiveness across 80 sentences. The relationship between listener ratings and F0 variability was examined using different statistical approaches for characterizing F0 variability (with and without tracking errors).

Results:

Measurements of F0 standard deviation, but not F0 mean, were significantly affected by tracking errors. Relationships between measurements of F0 variability and expressiveness were strengthened when tracking errors were removed from data analysis.

Conclusions:

Tracking errors significantly alter F0 standard deviation values for both speakers with PD and neurotypical adults. Case study evidence also suggests that tracking errors can reduce the strength of relationships between F0 variability and perceptual outcome measures, such as speech expressiveness.


Fundamental frequency (F0) is the rate of vibration of the vocal folds during speech production, which corresponds to the perceived pitch of a voice (Duffy, 2019). Reduced F0 variability is frequently reported in speakers with Parkinson's disease (PD; e.g., Brabenec et al., 2017; Bunton et al., 2001; Harel et al., 2004; Kovac et al., 2024) and is generally believed to be the acoustic feature most closely correlated with listeners' perceptions of monotonicity (Kent et al., 1999). Reduced variation in F0 results in numerous communicative challenges, as F0 not only affects listener understanding but also provides key information concerning emotional expression (Bunton, 2006; Lieberman & Michaels, 1962). However, while measurements of F0 are often reported in studies of dysarthria due to PD, there has been little discussion of the challenges involved in extracting accurate F0 data within this population. Acoustic speech analysis tools are known to make numerous errors when tracking F0, and these errors may be particularly prevalent when voice quality is reduced. The result is outlying data points, which could significantly affect the summary statistics used for describing F0 variability.

Praat software (Boersma & Weeninck, 2024) is the most widely used method for estimating F0 in literature in many disciplines, including linguistics, computer science, audiology, and speech-language pathology (Strömbergsson, 2016). However, as discussed in Exner et al. (2023) and Lennes et al. (2016), Praat's “get pitch” function is not sufficiently robust to abnormal voice qualities. Irregular periodicity is thought to result in an increase in tracking errors within the F0 contour. Of particular concern are cases where F0 values are halved (i.e., the tracking of subharmonic frequencies) or doubled (i.e., tracking of overtone frequencies), resulting in a series of frequency values that are lower or higher than the speaker's perceived pitch. These types of errors are commonly referred to as “octave jumps” and produce outlier values that may have a large impact on F0 distributions and summary statistics (Lennes et al., 2016).

One option for addressing F0 errors is to identify and discard these values from further statistical analysis. This strategy has been reported in numerous studies focused on both healthy and dysarthric speech. For instance, in their study of pitch in native and non-native Lombard speech (i.e., speaking in loud noise), Marcoux and Ernestus (2019) deleted doubling, halving, and creaky voice prior to calculating minimum and median F0. Similarly, Bowen et al. (2013) and Tykalova et al. (2014) reported removing all inconsistencies or incorrect detections of F0 after visual inspection of the pitch contour. Excluding the extreme, outlying values of minimum and maximum F0 can also be a strategy for removing errors, based on the location of those values within the utterance. For example, Van Der Bruggen et al. (2024) replaced extreme minimum and maximum values if they were isolated and were not associated with either phoneme boundaries or accented tones.

Another strategy for managing tracking errors is to directly correct errors in F0 estimates. For instance, some authors have dealt with F0 tracking errors by performing their own manual point-by-point additions to the F0 contour (e.g., Cantero & Font-Rotchés, 2020; Exner et al., 2023; Taylor, 1994). This method requires examining the F0 contour visually and listening to the speech. Based on a researcher's decision of whether or not the F0 contour corresponds to the perceived pitch, points of the contour might be added or removed. For example, Exner et al. (2023) described their process of manual correction in detail, stating that they measured the duration of each period in the waveform, identified at the zero crossing, and then calculated new F0 values as the inverse of the pitch period. This strategy has the advantage of minimizing the removal of F0 information from voiced segments. However, it is more cumbersome than simply removing F0 values, and has the potential to introduce new errors if not performed accurately.

Errors in F0 values may be substantially reduced by choosing appropriate settings when using Praat's speech analysis software. Several studies emphasize the importance of standardization and reporting of F0 settings and procedures when performing acoustic analyses of voice quality (Brockmann-Bauser & de Paula Soares, 2023; Vogel et al., 2009). However, studies in the field of speech disorders do not often specify the settings used to extract F0 values. It could be assumed, in most cases, that they are relying on Praat's standard, default settings to configure and query the pitch contour since there is no other “gold standard” setting. Nonetheless, further exploration of these settings would be beneficial, as it is possible that default floor/ceiling values are not the most appropriate for certain clinical populations, for example, when speech is characterized by lower F0 values or glottal fry. As highlighted in the Praat manual, the use of speaker-specific settings might be required for adequate F0 estimation in these cases (Boersma & Weenink, 2024).

Under ideal circumstances, Praat's F0 settings could be perfectly individualized based on the vocal characteristics of each speaker. However, as mentioned in Vogel et al. (2009), establishing individualized, speaker-appropriate pitch settings requires determining an analysis window length, intensity cutoffs, and expert knowledge of software configurations. To successfully establish this for every speaker in a study is likely to be extremely time- and resource-consuming in most cases. Manually altering settings for each person also introduces some degree of subjectivity (and opportunities for bias) in a previously objective acoustic measurement. A more helpful approach could be standardizing the selection of pitch range settings based on certain key speaker characteristics. For example, in their study, Vogel et al. (2009) found that simply using sex-specific pitch range settings improved the quality of F0 analysis.

Despite the improvements observed with sex-specific F0 settings, tracking errors remain a concerning issue, especially when analyzing disordered vocal qualities. Indeed, other techniques have been proposed to determine more specific floor/ceiling values for individual speakers, and reduce the number of potential “octave jumps” and other extreme F0 values. For example, de Looze and Hirst (2008) and de Looze et al. (2012) utilized a standardized process for choosing appropriate F0 settings that included the following steps: (a) setting the floor/ceiling to extreme values for every speaker, such as 60 and 600 Hz, and (b) determining an individualized floor and ceiling value for each speaker based on the results of this analysis. Pitch errors typically occur at the extreme upper and lower percentiles of a speaker's F0 distribution. Thus, to exclude the effects of outlying values in Step 1, the authors suggest adapting the floor and ceiling values based on a reduced percentile range. For example, in Step 2, a value for the pitch floor could be calculated as 0.75*q25 (where “q25” = the 25th percentile of F0 values obtained from Step 1), while the ceiling values could be calculated as 1.5*q75 (i.e., the 75th percentile from the Step 1 analysis). This technique was shown to exclude more F0 tracking errors than setting pitch parameters to larger ranges such as 100–500 Hz range for female voices and 75–300 Hz range for male voices. Multiple formulae of this nature have been trialed, with the most appropriate equation said to be dependent on the degree of expressivity in the speech sample (de Looze et al., 2012; de Looze & Hirst, 2008).

While this discussion has focused on Praat software, it should be acknowledged that there are alternative algorithms for F0 estimation, which have also been used to estimate pitch in dysarthric speech. For example, RAPT (Robust Algorithm for Pitch Tracking; Talkin & Kleijn, 1995) and YIN (de Cheveigné & Kawahara, 2002) have been used for pitch analysis in speakers with PD (Rodríguez-Pérez et al., 2019; Verkhodanova, 2021). However, a systematic review by Strömbergsson (2016) comparing multiple F0 extraction algorithms, including the abovementioned, demonstrated that Praat is not only the most frequently used method in such research areas as Linguistics, Computer Science, Audiology, and Speech-Language Pathology, but also the most accurate. To assess accuracy, Strömbergsson (2016) compared the reference F0 contours in the speech corpus to the F0 contours extracted by Praat, RAPT, and YIN algorithms. Four evaluation metrics were used: (a) gross pitch error (the proportion of frames where the relative pitch error is higher than 20%), (b) fine pitch error (the standard deviation of relative error values distribution from the frames that do not have gross pitch error), (c) voicing decision error (the proportion of frames for which an incorrect voiced/unvoiced decision is made), and (d) F0 frame error (the proportion of frames for which an error is made). Praat's overall performance was shown to be better than that of the other two algorithms, primarily because of Praat's superior voicing detection. Praat also benefitted the most from using sex-adapted pitch settings. However, despite promising results, we are still limited in our understanding of Praat's ability to track F0 in clinical populations.

Although the studies discussed thus far offer various strategies for reducing the occurrence of F0 tracking errors, or manually removing and correcting them, the extent to which these tracking errors influence F0 summary statistics and affect research findings is still not well understood. If disordered speech signals are more prone to tracking errors, it is important to establish whether these errors affect measurements of F0 variability that are commonly recommended as clinical outcome assessments, such as F0 standard deviation (Rusz et al., 2021). Significant shifts in F0 summary statistics may call into question our understanding of normative F0 values in dysarthric speech and extend to which these values differ from neurotypical populations.

There is also a question of what F0 summary statistics best represent the perceptual changes associated with speech disorders. In the case of dysarthria, prior studies of F0 generally report two summary statistics: mean F0 and standard deviation of F0 in Hz (e.g., Bunton et al., 2000; Goberman & Elmer, 2005). However, while these statistics are frequently used to describe F0 differences in dysarthria, neither measure is robust to outliers or deviations from the normal distribution (Wilcox & Rousselet, 2018). Since a speaker's F0 values may not necessarily be normally distributed or free from outlying values, it is possible that mean and standard deviation of F0 do not provide the most accurate representation of their overall voice quality. Another issue related to F0 measurement is the nonlinear way in which physical frequency changes are perceived (Oxenham & Plack, 1997). To be specific, listeners' sensitivity to small pitch changes is significantly higher at lower frequencies. Thus, if a speaker has a high F0 baseline, larger changes in Hertz may be required to avoid the perception of monotonicity. Alternative scales, such as semitones, can be used to represent our subjective perception of pitch intervals.

For these reasons, additional statistics might be helpful in the analysis of F0 to more accurately capture our perceptions of monotonicity. In addition to mean and standard deviation, the following F0 parameters have been used in the field of speech disorders to describe F0 variability: F0 variation range, relative variation range, interquartile range (IQR), semitone IQR, coefficient of variation (relative standard deviation), and semitone standard deviation. Table 1 provides citations to numerous examples of these measurements in clinical literature. Some of these measures are more robust deviations from the normal distribution caused by extreme outlying values, for example, the IQR excludes all F0 values outside of the 25th to 75th percentiles. Other measurements have been designed to better reflect our subjective perception of pitch changes (e.g., relative standard deviation, relative range, and semitone standard deviation). One particular measurement, semitone IQR, is designed to accomplish both of these goals—however, it has not, to our knowledge, been used in prior clinical literature. Since there is no single, agreed-upon measurement for evaluating monotonicity in speakers with dysarthria, it seems appropriate to consider how tracking errors might affect our perceptions of speech expressiveness across several indices of F0 variability.

Table 1.

Statistical indices for fundamental frequency (F0) variability.

Statistical index Calculation procedure Examples in clinical research
Range in Hz Range(F0) = Max (F0) − Min (F0) Galaz et al., 2016; Skodda et al., 2009
Relative range in Hz Range˜F0=Max(F0)Min(F0)E(F0)×100 Galaz et al., 2016; Skodda et al., 2009
Interquartile range in Hz IQR = Q3(F0) − Q1(F0) Lam & Tjaden, 2016; Kim et al., 2011
Standard deviation σF0=1N1F0EF02 Bunton et al., 2000; Goberman & Elmer, 2005; Holmes et al., 2000; Skodda et al., 2011
Relative standard deviation σ˜F0=σF0EF0×100 Galaz et al., 2016; Skodda et al., 2009
Semitone standard deviation σSTF0=39.86log10EF0+σF0EF0 Bowen et al., 2013; Feenaughty et al., 2014; Ramig et al., 1995
Semitone IQR IQRST=39.86log10Q3F0196log10Q1F0196

Note.E[x] refers to the expected value (arithmetic mean) of x and Qk(x) refers to the kth quartile of x. IQR = interquartile range.

Current Study

The primary purpose of this experimental study was to evaluate the accuracy of F0 tracking in speakers with hypokinetic dysarthria due to PD and to assess how tracking errors affect F0 summary statistics. We hypothesized that excluding F0 tracking errors from acoustic analysis would have a significant effect on summary measurements of F0. To establish whether these changes were clinically meaningful, we also performed a case study to examine how different F0 measurements—with and without tracking errors—impacted the relationship between acoustic measures and listener perceptions of speech. Specifically, we were interested in the well-established relationship between F0 variability and listeners' impressions of speech expressiveness (Bänziger & Scherer, 2005; Traunmüller & Eriksson, 1995). This perceptual outcome measure was selected because emotional expression has consistently been identified as reduced in speakers with dysarthria relative to neurotypical control speakers—with people with PD judged as less happy, involved, friendly, and interested based only on their speech samples (Jaywant & Pell, 2010; Pitcairn et al., 1990). There is also evidence that changes in F0 associated with hypokinetic dysarthria can significantly influence these impressions of expressiveness (e.g., Anzuino et al., 2023; Caekebeke et al., 1991; Möbes et al., 2008; Pell et al., 2006). We hypothesized that F0 statistics that excluded tracking errors, relative to those that did not, would more strongly predict judgments of expressiveness.

Experiment 1

Experiment 1 examined the effect of F0 tracking errors on measurements of F0 that are most commonly reported in the motor speech literature: mean F0 and F0 standard deviation. The purpose was to establish whether the F0 measurements reported in the motor speech literature are significantly affected by the presence of tracking errors. As a secondary goal, we also examined how tracking errors might influence reported differences in F0 values between neurotypical and dysarthric speech in a hypothetical study where tracking errors were either included or excluded.

Method

Speech Stimuli

Institutional review board (IRB) approval for the collection and acoustic analysis of speech samples was obtained through Utah State University (IRB #11380). Speech stimuli used in this study were selected from a previously collected corpus described in Borrie et al. (2022) and included the recordings of 16 native speakers of American English including eight speakers with PD (four women and four men) aged 59 to 73 years old (M = 66.50, SD = 4.81) who were assessed as having monotone voices, and eight age- and sex-matched neurotypical control speakers (four women and four men) aged 56 to 71 years old (M = 63.13, SD = 4.80) who were assessed as having normal speech prosody. Each speaker signed an informed consent document before beginning study tasks. The speakers with PD were evaluated by two speech-language pathologists as exhibiting mild-to-moderate hypokinetic dysarthria with monotonicity as a salient feature. Both groups of speakers were prompted to read The Caterpillar passage (Patel et al., 2013) consisting of 16 sentences.

Recordings were made using a cardioid lavalier microphone placed approximately 20 cm from the speaker's mouth. Speech samples were recorded to a laptop using custom software via a Shure X2U XLR-to-USB adapter, with a sampling rate of 48 kHz and 16 bits of quantization.

Evaluating the Accuracy of F0 Tracking

The first author labeled all pitch-tracking errors through manual examination of the pitch object extracted from each sentence. Pitch values from each section of the pitch track were compared with auditory judgments of the speaker's pitch and visual analysis of the waveform and spectrogram in Praat to determine whether an error in tracking had occurred. In addition, for reliability purposes, two research assistants independently labeled 10% of the total sentences that were randomly selected for re-analysis. For both the labeling and extraction of F0 values, the current recommended Praat pitch settings were used: a ceiling of 600 Hz for female speakers and 300 Hz for male speakers. The floor value was set to 50 Hz for both groups of speakers to account for the high prevalence of vocal fry, according to pitch range recommendations (Boersma & Weeninck, 2024).

To check for errors in a methodical manner, we systematically assessed for the occurrence of four main types of errors in each section of the pitch track, as described in Exner (2019): (a) not tracking F0 during voiced segments; (b) tracking F0 within unvoiced segments/pauses; (c) tracking overtone frequencies; and (d) tracking subharmonic frequencies. For errors that were identified as “F0 not tracking during voiced segments” and “F0 tracking within unvoiced segments,” we only labeled errors when they were longer than 0.05 seconds in duration. This minimum duration was implemented to avoid researchers labeling small disagreements in the onset/offset of voiced phonemes as F0 tracking errors.

To evaluate the accuracy of the F0 tracking in Praat, two scripts were compared. The first script utilized the standard Praat extraction algorithm to track F0 and extract the F0 contour. The second script was designed to measure F0 without considering the segments labeled as tracking errors. Mean F0 and standard deviation were then calculated for each sentence and each speaker using the results from both scripts. The two Praat scripts analyzed the same number of total sentences (as the removal of errors never resulted in the removal of all voiced segments within a sentence). This study used a recently updated version of Praat (Version 6.4.0) for both identifying pitch tracking errors and extracting F0 values. It is important to be aware that Praat changed their default algorithm in 2023 from a raw autocorrelation method to one that applies a Gaussian low-pass filter to the audio signal before computing the autocorrelation function. Thus, pitch values extracted in this study use the newer filtered autocorrelation method and might not accurately reflect the impact or prevalence of tracking errors from other methods.1

Reliability

To assess the reliability of our tracking error identification, an intraclass correlation coefficient (ICC) estimate was calculated in R based on a single-measures, consistency-agreement, two-way random-effects model. The model compared the duration of errors identified in each sentence by the three raters, with reliability assessed in the context of a single measure of a single labeler (i.e., we assessed the reliability of a single person's labels, which were used in the subsequent analyses of results, rather than the average duration of the three raters). The obtained ICC(C,1) = 0.68 indicated good overall reliability for a single rater (Cicchetti, 1994; McGraw & Wong, 1996).

Statistical Analysis

All results were imported into R for statistical analysis. A total of six separate mixed-effect regression models were constructed using the lme4 package. Firstly, we examined the effect of tracking errors on the F0 mean and F0 standard deviation values for each sentence. In these two models, F0 mean and F0 standard deviation were inputted as the two dependent variables, with a fixed effect of script type (i.e., including or not including errors) dummy-coded as a categorical variable. Random intercepts were included for speaker and sentence, to account for repeated measures. Following this, we partitioned our data to examine the degree to which tracking errors influence our ability to measure differences in F0 mean and standard deviation across groups (i.e., differences in F0 due to speaker sex and presence of dysarthria). This secondary analysis examined one data set that included F0 tracking errors and another that did not. Two separate models were built to measure the fixed effects of sex and speaker group on F0 mean in each data set, which were both dummy-coded as categorical variables. Then two models were built to measure the same fixed effects on F0 standard deviation in each data set. The same random intercepts were included for speaker and sentence. The two-tailed significance level was set at α = .05 for all models.

Results

Our first set of statistical models examined whether F0 tracking errors had a statistically significant effect on measures of F0 mean and F0 standard deviation that are commonly reported in the speech pathology literature. These models are reported in Table 2. For mean F0, results demonstrated that sex (b = 51.99, SE = 13.66, p = .002) significantly influenced speakers' F0 measurements. However, neither the removal of tracking errors (b = 1.03, SE = 0.65, p = .115) nor the speaking group with dysarthria (b = −10.01, SE = 13.66, p = .477) were shown to significantly change F0 means. For measurements of F0 standard deviation, all three fixed effects of tracking error removal (b = −2.05, SE = 0.50, p < .001), female sex (b = 10.82, SE = 1.73, p < .001), and speaking group with dysarthria (b = −9.88, SE = 1.73, p < .001) were shown to be statistically significant. Thus, when tracking errors were included, there was an estimated average increase of over 2 Hz in the F0 standard deviation values. The direction of the effect sizes demonstrates that tracking errors are responsible for inflating F0 variability when they are included in the calculation of standard deviation F0.

Table 2.

Mixed-effects regression of fundamental frequency (F0) mean and standard deviation (SD) on type of script, sex, and speaker group.

Coefficient F0 M
F0 SD
Estimates CI Estimates CI
Intercept 132.26*** [108.90, 155.61] 22.46*** [19.31, 25.60]
Type of script (corrected values without tracking errors) 1.03 [−0.25, 2.32] −2.05*** [−3.03, −1.07]
Sex (female) 51.99*** [25.16, 78.81] 10.82*** [7.43, 14.21]
Speaker group (dysarthria) −10.01 [−36.83, 16.82] −9.88*** [−13.27, −6.49]
Random effects
σ2 54.29 31.26
τ00 744.09Speaker 10.91Speaker
22.70Sentence 4.20Sentence
ICC .93 .32
N 16Speaker 16Speaker
16Sentence 16Sentence
Observations 508 508
Marginal R2/Conditional R2 .460/.964 .538/.687

Note. CI = confidence interval; ICC = intraclass correlation coefficient.

***

p < .001.

To understand how F0 tracking errors affect our ability to model differences between neurotypical and dysarthric speech, our second analysis investigated the effects of speaker sex and the presence of dysarthria on the same two F0 summary statistics. To accomplish this, we separated the data into two subsets, one that considered all F0 values, including errors, and one that excluded the tracking errors. We then built two separate models with either F0 mean or F0 standard deviation as a dependent variable for each subset.

Table 3 reports the results for the models with F0 mean as a dependent variable. The results show that in both data subsets, there was a significant effect of sex but no effect of dysarthria on the F0 mean measurement. The effect size of sex was reasonably similar when tracking errors were included (b = 51.13, SE = 13.43, p = .002) and when they were removed (b = 52.85, SE = 13.89, p = .002). These results can be visualized in Figure 1.

Table 3.

Mixed-effects regression of fundamental frequency (F0) mean on sex and health status.

Coefficient Data containing tracking errors
Data excluding tracking errors
Estimates CI (95%) Estimates CI (95%)
Intercept 132.65*** [109.62, 155.67] 139.70*** [109.09, 156.68]
Sex (Female) 51.13*** [24.68, 77.59] 52.53** [25.49, 80.21]
Health status (Dysarthria) −9.94 [−36.40, 16.52] −9.39 [−37.44, 17.27]
Random effects
σ2 54.84 58.55
τ00 518.44Speaker 767.96Speaker
20.91Sentence 20.79Sentence
ICC .93 .93
N 16Speaker 16Speaker
16Sentence 16Sentence
Observations 254 254
Marginal R2/Conditional R2 .461/.963 .461/.963

Note. CI = confidence interval; ICC = intraclass correlation coefficient.

**

p < .01.

***

p < .001.

Figure 1.

The image displays 2 dot plots for the mean F0 in hertz by group, sex, and correctness of the script. The groups are neurotypical and dysarthria. The 2 sexes are male and female. Graph 1: Uncorrected script. The mean values for the neurotypical group are as follows. Male: 140. Female: 170. The mean values for the dysarthria group are as follows. Male: 130. Female: 165. Graph 2: Corrected script. The mean values for the neurotypical group are as follows. Male: 135. Female: 170. The mean values for the dysarthria group are as follows. Male: 130. Female: 165.

Effects of sex and speaking group on fundamental frequency (F0) mean for two script types. Note the similarity between the two graphs. Uncorrected script = data including tracking errors; corrected script = data excluding tracking errors.

In Table 4, the results for the models with F0 standard deviation as a dependent variable are summarized. The results demonstrate that, in both data subsets, there was a significant effect of sex on F0 standard deviation, but the effect was substantially larger when tracking errors were included (b = 12.46, SE = 1.94, p < .001) than when they were removed (b = 9.19, SE = 1.72, p < .001). As can be seen in Figure 2, female speakers experienced a larger inflation of their F0 standard deviation as a result of tracking errors, which is likely to explain why their removal reduced these sex-related differences. The effect of dysarthria on F0 standard deviation values was also statistically significant both when tracking errors were included in the data set (b = −9.39, SE = 1.94, p < .001) and when they were removed (b = −10.37, SE = 1.72, p < .001). However, in contrast to the sex effects, differences between healthy and neurotypical speech were when substantially larger following the removal of tracking errors.

Table 4.

Mixed-effects regression of fundamental frequency (F0) standard deviation on sex and health status.

Coefficient Data containing tracking errors
Data excluding tracking errors
Estimates CI (95%) Estimates CI (95%)
Intercept 21.39*** [17.96, 24.82] 21.47*** [18.40, 24.54]
Sex (Female) 12.46*** [8.64, 16.27] 9.19*** [5.81, 12.57]
Health status (Dysarthria) −9.39*** [−13.21, −5.58] −10.37*** [−13.75, −6.99]
Random effects
σ2 38.85 23.45
τ00 12.55Speaker 10.30Speaker
3.58Sentence 3.63Sentence
ICC .29 .37
N 16Speaker 16Speaker
16Sentence 16Sentence
Observations 254 254
Marginal R2/Conditional R2 .525/.664 .561/.725

Note. CI = confidence interval; ICC = intraclass correlation coefficient.

***

p < .001.

Figure 2.

The image displays two line graphs for the standard deviation of F0 in hertz by gender, group, and correctness of script. Graph 1: Uncorrected script. The data for the neurotypical group are as follows. Male: 21. Female: 34. The data for the dysarthria group are as follows. Male: 12. Female: 24. Graph 2: Corrected script. The data for the neurotypical group are as follows. Male: 21. Female: 30. The data for the dysarthria group are as follows. Male: 11. Female: 20.

Effects of sex and speaking group on fundamental frequency (F0) standard deviation (SD) for two script types. Uncorrected script = data including tracking errors; corrected script = data excluding tracking errors.

Experiment 2

The goal of Experiment 2 was to examine the effect of tracking errors in a specific context and see how the inclusion of tracking errors would influence the results of a hypothetical scientific study. Thus, Experiment 2 examined the influence of F0 changes on a clinically relevant perceptual outcome measure, listeners' impressions of speech expressiveness. The results of this case study were compared under two scenarios: one in which the hypothetical study included tracking errors in its measurements of F0 variability and one in which the tracking errors were removed before the calculation of F0 variability. In addition to the typically used measurement of F0 standard deviation, we explored several other summary statistics that have been suggested as alternative procedures for measuring F0 variability (see Table 1). Specifically, we were interested in determining whether the removal of pitch-tracking errors strengthened the relationship between acoustic and perceptual measures. IRB approval for the collection of listener ratings was obtained through Utah State University (IRB # 12958). All listeners signed an informed consent document before beginning study tasks.

Method

Measuring Perception of Speech Expressiveness

All recordings were scaled to have an average intensity of 70 dB for consistency in the perceptual experiment. Five sentences from each of the 16 speakers were selected to be included in the perceptual experiment based on two factors—all the sentences had to be declarative and of a similar length (M = 13.20, SD = 3.49). The same set of five sentences were included for each of the 16 speakers, resulting in a total of 80 sentences.

Twenty native speakers of American English (14 women and six men) aged 18 to 34 years old (M = 22, SD = 4.05) completed the listening task. Listeners were recruited from the Utah State University community and received a gift card or course credit for their participation. Listeners reported that they had normal hearing with no known history of speech, language, or hearing difficulties and all listeners passed a pure-tone hearing screening at 20 dB hearing level for 500, 1000, 2000, and 4000 Hz in both ears. Listeners were asked to answer the question, “How expressive is this speaker?” using a visual analog scale via a customized MATLAB program with extreme positions being “not expressive” and “very expressive.” Expressiveness was defined as “how dynamic and animated the speaker is.” During the experiment, listeners were seated alone, in a quiet room, and presented with sentences via Sennheiser HD 598 closed-back headphones. Prior to beginning the perceptual experiment, listeners were presented with two example sentences to allow them to adjust the volume to the level they felt comfortable with and familiarize themselves with the task interface. Then listeners were prompted to provide one rating following the presentation of each sentence. Sentences were the same for each listener, but the order of presentation was randomized. In total, the task took approximately 20 min for listeners to complete.

To correlate the perception of expressiveness with F0 variability, seven summary statistics were calculated—F0 range, F0 IQR, F0 relative range, F0 standard deviation, F0 relative standard deviation, F0 semitone standard deviation, and F0 IQR in semitones. These statistical indices were calculated according to the formulas in Table 1.

Reliability

To assess the reliability of listener ratings of expressiveness, an ICC estimate was calculated in R using the irr package based on an average rating, consistency-agreement, two-way, random-effects model. Reliability was assessed in the context of an average rating across 20 random listeners, as the average listener rating is what was subsequently analyzed in our results. The obtained ICC(C,20) = 0.96 indicated excellent overall reliability of the expressiveness ratings (Cicchetti, 1994; McGraw & Wong, 1996).

Statistical Analysis

Mixed-effects models were used to examine listeners' ratings of expressiveness using the lme4 package. To assess whether using different F0 variability statistics can improve our ability to predict perceptual ratings of expressiveness, we build 14 separate mixed-effects linear models—seven models for each of the following F0 variability measures: range, relative range, IQR, standard deviation, relative standard deviation, standard deviation in semitones, and F0 IQR in semitones from the two data subsets: including all F0 values with tracking errors versus data excluding tracking errors. In these models, perceptual ratings were entered as a dependent variable; sex, dysarthria, and the F0 variability measure of interest were entered as fixed effects, while speaker, sentence, and listener were included as random intercepts to account for repeated measures. Prior to their input as independent variables, all F0 variability measurements were standardized, with a mean of zero and variance of one, so their effect sizes could be easily interpreted and compared across models. Again, the two-tailed significance level was set at α = .05 for all models.

Results

Table 5 summarizes the results for the data subset that included tracking errors. The results show that for the F0 statistics, which include Praat F0 tracking errors, five out of seven acoustic measurements had a statistically significant effect on listener ratings of expressiveness—F0 IQR, F0 semitone IQR, F0 standard deviation, F0 relative standard deviation, and F0 semitone standard deviation. As expected, across all models there was also a significant effect of dysarthria on the perception of speech expressiveness, with a 20%–25% reduction in total levels of expressiveness attributable to the presence of hypokinetic dysarthria.

Table 5.

Fixed effects of seven fundamental frequency (F0) variability statistics that include tracking errors on the perception of speech expressiveness.

Variability statistic Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Model 7
Perceptual ratings of expressiveness
F0 range −0.77
(−0.75)
F0 IQR 2.93**
(−0.91)
F0 semitone IQR 2.87***
(−0.68)
F0 relative range 0.52
(−0.57)
F0 SD 3.62***
(−0.97)
F0 relative SD 2.72***
(−0.68)
F0 semitone SD 2.72***
(−0.69)
Speaker group (dysarthria) −25.9*** −23.7*** −23.4*** −26.0*** −23.7*** −24.0*** −24.0***
(−5.53) (−5.37) (−5.34) (−5.53) (−5.33) (−5.32) (−5.32)
Sex (female) 8.23 5.69 7.74 8.94 4.08 7.04 7.03
(−5.58) (−5.41) (−5.31) (−5.52) (−5.45) (−5.31) (−5.31)
AIC 6677.75 6668.33 6661.51 6678.53 6664.77 6663.34 6663.66

Note. Model estimates are expressed as a percentage of change across the total length of the visual analog scale. F0 measurements were standardized, so each of their estimates represents the percentage of change attributable to a 1-SD increase in their value. Standard errors are included in brackets. IQR = interquartile range; AIC = Akaike information criterion.

**

p < .01.

***

p < .001.

Table 6 presents results for the data subset that excluded tracking errors. In these models, all seven acoustic measurements were statistically significant predictors of listener impressions, and all effect sizes were larger than the corresponding models in Table 5. Again, as expected, there was a significant effect of dysarthria on expressiveness ratings in all models. Lower Akaike information criterion scores also demonstrated that model fit was better for the models that were run using data that excluded F0 tracking errors. Figure 3 highlights changes in effect sizes as a result of tracking errors and the different statistics used to capture F0 variability.

Table 6.

Fixed effects of seven fundamental frequency (F0) variability statistics that exclude tracking errors on the perception of speech expressiveness.

Variability statistic Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Model 7
Perceptual ratings of expressiveness
F0 range 3.87***
(−0.77)
F0 IQR 3.20***
(−0.90)
F0 semitone IQR 3.20***
(−0.69)
F0 relative range 2.84***
(−0.64)
F0 SD 8.32***
(−1.01)
F0 relative SD 6.30***
(−0.78)
F0 semitone SD 6.37***
(−0.79)
Speaker group (dysarthria) −22.11*** −23.34*** −22.92*** −23.12*** −16.55*** −17.92*** −17.77***
(−5.59) (−5.36) (−5.32) (−5.60) (−5.38) (−5.47) (−5.47)
Sex (female) 5.26 5.45 7.72 8.88 −0.56 6.88 6.82
(−5.58) (−5.39) (−5.28) (−5.56) (−5.38) (−5.37) (−5.37)
AIC 6655.45 6666.13 6657.71 6659.93 6615.37 6616.79 6616.85

Note. Model estimates are expressed as a percentage of change across the total length of the visual analog scale. F0 measurements were standardized, so each of their estimates represents the percentage of change attributable to a 1-SD increase in their value. Standard errors are included in brackets. IQR = interquartile range; AIC = Akaike information criterion.

***

p < .001.

Figure 3.

A bar graph for the effect size across different statistics by condition. The conditions are: with errors and no errors. The statistics are F0 range, F0 IQR, F0 IQR (semitones), F0 relative range, F0 SD, F0 relative SD, and F0 SD (semitones). The data for the with errors condition are as follows. F0 range: negative 0.6. F0 IQR: 2.7. 2 asterisks are marked. F0 IQR (semitones): 2.7. 3 asterisks are marked. F0 relative range: 0.5. F0 SD: 3.5. 3 asterisks are marked. F0 relative SD: 2.6. 3 asterisks are marked. F0 SD (semitones): 2.6. 3 asterisks are marked. The data for the no errors condition are as follows. F0 range: 3.8. 3 asterisks are marked. F0 IQR: 3.0. 3 asterisks are marked. F0 IQR (semitones): 3.0. 3 asterisks are marked. F0 relative range: 2.7. 3 asterisks are marked. F0 SD: 8.0. 3 asterisks are marked. F0 relative SD: 6.2. 3 asterisks are marked. F0 SD (semitones): 6.2. 3 asterisks are marked.

Comparison of effect sizes as a function of tracking errors and the different statistics used to capture fundamental frequency (F0) variability. Effect size represents the total percentage of change across the visual analog scale attributable to a 1-SD increase in the F0 variability measurement. IQR = interquartile range; Rel. = relative.

Discussion

This two-experiment study explored the measurement of F0 variability in speakers with hypokinetic dysarthria and neurotypical control speakers. The aim of the first experiment was to examine whether F0 tracking errors caused statistically significant changes in F0 summary statistics, including F0 mean and F0 standard deviation. We also explored how tracking errors could influence reported differences in F0 values between neurotypical and dysarthric speech in cases where they were included or not included in data analysis. The goal of our second experiment was to examine the effect of tracking errors in a specific context and see how the inclusion of tracking errors would influence the results of a hypothetical scientific study. In this experiment, we examined the relationship between F0 variability statistics and perceptual ratings of speech expressiveness. Overall, we found that speech stimuli from both speakers with dysarthria and age-matched, neurotypical adults contained F0 tracking errors that caused statistically significant changes in F0 standard deviation when using Praat's standard pitch settings. In contrast, there was no significant effect of tracking errors on F0 mean values. In further investigating F0 variability, our case study evidence suggests that changes due to tracking errors reduce estimated differences between dysarthric and neurotypical speech, as well as the relationship between F0 variability and perceived expressiveness. This discussion considers whether these changes in effect size should be regarded as clinically important, an issue that depends largely on the population being examined and the purpose of a given study.

The first experiment found that a failure to remove F0 tracking errors resulted in statistically significant differences in the standard deviation of F0. The outcomes was higher measures of F0 standard deviation, with an estimated average increase of 2 Hz across speakers. F0 mean values were not significantly altered. This finding is not unexpected, as tracking errors tend to result in extreme outlier values that occur in both directions (i.e., inaccurate F0 values that are too low and F0 values that are too high). When outliers occur in both directions, they may wash each other out in terms of their impact on the mean. However, in the case of standard deviation, the presence of outliers in both directions increases the overall spread of the data, leading to greater standard deviation values. In summary, these results suggest that tracking errors can systematically inflate measurements of F0 variability, like F0 standard deviation, in a statistically significant manner. However, whether a 2-Hz increase in F0 standard deviation is meaningful in clinical research is debatable. When expressing the 2 Hz error as a percentage of the speaker's total F0 standard deviation, it represents a much larger number for male speakers with dysarthria (accounting for an approximate 17% change in their F0 standard deviation) than neurotypical females (accounting for around a 7% change). Thus, whether a 2-Hz inflation of F0 standard deviation is acceptable may depend on the populations being examined and the purpose of the study (e.g., if normative data on a reading passage is being reported, inflated F0 standard deviation values may be more problematic).

Running separate mixed-effects models for data sets that included or excluded tracking errors provided evidence that changes in F0 standard deviation that occur due to tracking errors can alter the estimated size of acoustic differences between speaker groups. When tracking errors were removed from the data, the effect of speaker group (i.e., change due to PD and associated hypokinetic dysarthria) was stronger and the standard error was reduced. However, the effects of dysarthria on F0 standard deviation remained statistically significant regardless of whether tracking errors were removed. Previous literature has consistently suggested that hypokinetic dysarthria has a significant effect on F0 standard deviation, with lower F0 variability observed in speakers with dysarthria (Bowen et al., 2013; Skodda et al., 2011). The current results support this finding (even when tracking errors are included) but also suggest the size of these F0 standard deviation differences might be dampened when F0 tracking errors are not removed from the data set. In our case study, the inclusion of tracking errors reduced the effects of hypokinetic dysarthria by close to 10% (as reported in Table 3)—a factor that may be meaningful in clinical research, where small differences in F0 standard deviation are useful in determining treatment goals and documenting progress. In contrast, our analysis of mean F0 revealed that the presence of hypokinetic dysarthria had no significant effect on F0 means regardless of whether tracking errors were included in the analysis.

To explore whether tracking errors could affect the results of a hypothetical study, experiment two investigated the relationship between F0 variability and perceptual ratings of speech expressiveness. This perceptual outcome measurement was of interest because speakers with dysarthria are often said to exhibit reduced emotional expression within their voices, and F0 standard deviation has already been established as an important acoustic indicator of reduced expressiveness (Bänziger & Scherer, 2005; Traunmüller & Eriksson, 1995). Firstly, of note, our descriptive findings of expressiveness revealed that speakers with dysarthria associated with PD were rated as less expressive than healthy speakers by around 20%. This finding supports previous literature indicating that speakers with PD exhibit decreased emotional expression, even within relatively neutral (i.e. nonemotional) speaking contexts (Anzuino et al., 2023; Caekebeke et al., 1991; Möbes et al., 2008; Pell et al., 2006). These findings are suggested to be due to changes in the control of laryngeal musculature associated with PD, as the dopaminergic deficiency causes bradykinesia and muscle rigidity, reducing the speaker's ability to adjust vocal fold length (Ma et al., 2020).

Seven different F0 variability statistics (standard deviation, relative standard deviation, standard deviation in semitones, F0 range, relative range, IQR, and IQR in semitones) were used to characterize the relationship between F0 variability and expressiveness scores. As expected, when tracking errors were removed, all F0 variability statistics showed a statistically significant positive relationship with expressiveness scores (i.e., sentences with greater F0 range and variability were consistently rated as more expressive by the listeners). In contrast, when errors were not removed, only five of the F0 variability measurements demonstrated a significant relationship of expressiveness scores. The two measurements that were not statistically significant were F0 range and relative range, with F0 range exhibiting a negative relationship with expressiveness ratings prior to the removal of F0 tracking errors. This finding is relatively unsurprising. If tracking errors from subharmonic and overtone frequencies are the highest and lowest values present in a sentence, an analysis of F0 range (i.e., maximum F0–minimum F0) would be based solely on these errors, and not include any real F0 values produced by the speaker, resulting in nonsignificant results.

In examining the strength of the relationship between acoustic and perceptual measures, we found that all effect sizes were increased when tracking errors were removed. The relationship between listener ratings and F0 standard deviation values, in particular, appeared to be most affected by the exclusion of tracking errors. As mentioned in the introduction, standard deviation values are not robust to outliers and can be easily skewed by a single, highly deviating value. In our case, the effect size for F0 standard deviation was more than doubled when tracking errors were removed from the analysis. Interestingly, although IQR measures were found to be more robust to outliers, as evidenced by the relatively small difference between effect sizes with and without tracking errors in its calculation, the IQR measures did not generally outperform standard deviation measures when errors were present. Indeed, the IQR measures had only a moderate relationship with listeners' perceptions of expressiveness regardless of whether tracking errors were present in the data. In contrast, the F0 standard deviation measurement demonstrated a strong relationship with listener ratings when errors were removed. The reason why standard deviation measures were particularly effective at indexing listener perceptions of speech may be due to the nature of the measurement. Unlike IQR and F0 range measures, F0 standard deviation considers every data point, helping it to capture the full range of variability in the data set.

Limitations and Future Directions

Our study provides one example of how acoustic–perceptual relationships can be altered by the removal of tracking errors. However, it is important to recognize that this finding is specific to our speech samples, and we do not know if the same effects would occur in other populations of speakers. Our experiment provided evidence that the removal of tracking errors can change the strength of relationships between acoustic measures of F0 standard deviation and perceptual ratings of expressiveness. However, the design of this case study does not allow us to make statistical inferences and generalize this finding to a wider population.

An additional issue stems from the fact that sections of speech containing tracking errors were completely removed from our analysis, rather than being manually corrected. It is possible that the removal of speech data—in addition to the errors themselves—may have caused some systematic changes in F0 values. For example, if tracking errors occurred more often in areas of speech that contained particularly high or low F0 values, the removal of these sections of speech could deflate F0 variation—perhaps in some speakers more than others. Thus, it is possible that the act of removing certain speech segments, regardless of whether they contained errors in F0 estimates, could be influencing the results of this study. A better understanding of where different pitch-tracking errors occurred across speakers could help to elucidate this issue.

In future research, the effect of F0 tracking errors on additional perceptual outcome measures, including the salient features of hypokinetic dysarthria such as monotonicity and reduced stress, as well as perceived listening effort, would be useful to explore. Indeed, perceived listening effort has consistently been found to be increased in speakers with dysarthria (Fletcher et al., 2019, 2022; Stipancic et al., 2021; Whitehill & Wong, 2006), and reduced F0 variability in PD has recently been suggested as a factor that might influence these perceptions (Van Der Bruggen et al., 2024). Whether these additional perceptual outcomes are affected by the presence/absence of tracking errors would further elucidate the impact of these errors in both clinical and research efforts. It may be helpful to also include an analysis of the nature of F0 tracking errors in future efforts so we can better understand if there are differences in the type of errors made when analyzing neurotypical and dysarthric speech. This type of labeling will allow us to further explore which errors are more frequent in which group of speakers.

It is also important to note that the number and type of pitch-tracking errors may be heavily dependent on the pitch-tracking algorithm that is used. Thus, it is unclear how applicable the findings of this study are to other pitch-tracking algorithms, including even alternative methods and parameterizations of pitch tracking in Praat. Since Praat changed their default algorithm in 2023, studies that used the earlier version may not experience the same frequency or type of tracking errors as those found in the present study. Variations in other estimation parameters, such as pitch floor/ceiling or voicing/silence thresholds, could also influence the accuracy of the tracking algorithm.

One final issue in the current study was the combined analysis of the modal register together with creaky voice. Low F0 values in creaky voice are not a result of tracking errors. However, the presence of creaky voice is likely to affect the distribution of F0 values across sentences—potentially increasing the range and variability in F0 values. For measures of F0 variability to be meaningful, we must be clear on what type of F0 variation we are attempting to index. In the speech disorder literature, the perception of monotonicity is typically related to a flattening of the F0 contour in the modal voice register, while the presence of vocal fry or creaky voice is considered a separate feature of voice quality (Duffy, 2019). However, if F0 variation is measured without removing creaky voice, a flattening of F0 in the modal register might be difficult to detect. For example, if a speaker has a flat F0 contour in their modal range, but high levels of vocal fry, they may appear to have large variability in F0 values. Thus, in future work, it might be useful to look only at the modal register, excluding creaky voice and lower frequencies from the analysis.

Conclusions

In this two-experiment study, we highlighted the challenges associated with automated measurements of F0 and provided evidence that a common method of extracting F0 values produces errors that inflate F0 standard deviation values. In a follow-up case study, we also showed that these tracking errors have the potential to reduce the strength of acoustic–perceptual relationships, which may be of particular interest in clinical research.

Data Availability Statement

The Praat scripts used in this study are available at: https://github.com/AnnaliseFletcher/Praat-Scripts-for-Assessing-Fundamental-Frequency. Our speech recordings are not publicly available to protect participants' confidentiality. However, anonymized data may be available from the corresponding author on reasonable request.

Acknowledgments

This research was supported by the National Institute on Deafness and Other Communication Disorders Grant R01DC020713 (Principal Investigator: Stephanie Borrie).

Publisher Note: This article is part of the Special Issue: Select Papers From the 2024 Conference on Motor Speech—Basic Science and Clinical Innovation.

Footnote

1

As of Version 6.4.0, Praat still uses the raw autocorrelation method as default for scripts that contain the “To Pitch” function. However, values from the filtered autocorrelation method can be attained by scripting with the To Pitch (filtered ac) function.

References

  1. Anzuino, I., Baglio, F., Pelizzari, L., Cabinio, M., Biassoni, F., Gnerre, M., Blasi, V., Silveri, M. C., & Di Tella, S. (2023). Production of emotions conveyed by voice in Parkinson's disease: Association between variability of fundamental frequency and gray matter volumes of regions involved in emotional prosody. Neuropsychology, 37(8), 883–894. 10.1037/neu0000912 [DOI] [PubMed] [Google Scholar]
  2. Bänziger, T., & Scherer, K. R. (2005). The role of intonation in emotional expressions. Speech Communication, 46(3–4), 252–267. 10.1016/j.specom.2005.02.016 [DOI] [Google Scholar]
  3. Boersma, P., & Weenink, D. (2024). Praat: Doing phonetics by computer [Computer software]. https://www.praat.org/
  4. Borrie, S. A., Wynn, C. J., Berisha, V., & Barrett, T. S. (2022). From speech acoustics to communicative participation in dysarthria: Toward a causal framework. Journal of Speech, Language, and Hearing Research, 65(2), 405–418. 10.1044/2021_JSLHR-21-00306 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bowen, L. K., Hands, G. L., Pradhan, S., & Stepp, C. E. (2013). Effects of Parkinson's disease on fundamental frequency variability in running speech. Journal of Medical Speech-Language Pathology, 21(3), 235–244. [PMC free article] [PubMed] [Google Scholar]
  6. Brabenec, L., Mekyska, J., Galaz, Z., & Rektorova, I. (2017). Speech disorders in Parkinson's disease: Early diagnostics and effects of medication and brain stimulation. Journal of Neural Transmission, 124(3), 303–334. 10.1007/s00702-017-1676-0 [DOI] [PubMed] [Google Scholar]
  7. Brockmann-Bauser, M., & de Paula Soares, M. F. (2023). Do we get what we need from clinical acoustic voice measurements? Applied Sciences, 13(2), Article 941. 10.3390/app13020941 [DOI] [Google Scholar]
  8. Bunton, K. (2006). Fundamental frequency as a perceptual cue for vowel identification in speakers with Parkinson's disease. Folia Phoniatrica et Logopaedica, 58(5), 323–339. 10.1159/000094567 [DOI] [PubMed] [Google Scholar]
  9. Bunton, K., Kent, R. D., Kent, J. F., & Duffy, J. R. (2001). The effects of flattening fundamental frequency contours on sentence intelligibility in speakers with dysarthria. Clinical Linguistics & Phonetics, 15(3), 181–193. 10.1080/02699200010003378 [DOI] [Google Scholar]
  10. Bunton, K., Kent, R. D., Kent, J. F., & Rosenbek, J. C. (2000). Perceptuo-acoustic assessment of prosodic impairment in dysarthria. Clinical Linguistics & Phonetics, 14(1), 13–24. 10.1080/026992000298922 [DOI] [PubMed] [Google Scholar]
  11. Caekebeke, J. F., Jennekens-Schinkel, A., van der Linden, M. E., Buruma, O. J., & Roos, R. A. (1991). The interpretation of dysprosody in patients with Parkinson's disease. Journal of Neurology, Neurosurgery & Psychiatry, 54(2), 145–148. 10.1136/jnnp.54.2.145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cantero, S. F. J., & Font-Rotchés, D. (2020). Melodic Analysis of Speech (MAS). Phonetics of intonation. In J. Abasolo, P. Irati, & A. Ensunza (Eds.), Contributions on education (pp. 20–47). Universidad del País Vasco. [Google Scholar]
  13. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6(4), 284–290. 10.1037/1040-3590.6.4.284 [DOI] [Google Scholar]
  14. de Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917–1930. 10.1121/1.1458024 [DOI] [PubMed] [Google Scholar]
  15. de Looze, C., Ghio, A., Scherer, S., Pouchoulin, G., & Viallet, F. (2012). Automatic analysis of the prosodic variations in Parkinsonian read and semi-spontaneous speech. In Speech Prosody 2012 (pp. 71–74). International Speech Communication Association. 10.21437/SpeechProsody.2012-21 [DOI] [Google Scholar]
  16. de Looze, C., & Hirst, D. (2008, April). Detecting changes in key and range for the automatic modelling and coding of intonation. In Speech Prosody 2008 (pp. 135–138). International Speech Communication Association. [Google Scholar]
  17. Duffy, J. R. (2019). Motor speech disorders: Substrates, differential diagnosis, and management (4th ed.). Mosby. [Google Scholar]
  18. Exner, A. H. (2019). The effects of speech tasks on the prosody of people with Parkinson disease [Master's thesis, Purdue University Graduate School]. 10.25394/PGS.9936275.v1 [DOI] [Google Scholar]
  19. Exner, A. H., Francis, A. L., MacPherson, M. K., Darling-White, M., & Huber, J. E. (2023). The effects of speech task on lexical stress in Parkinson's disease. American Journal of Speech-Language Pathology, 32(2), 506–522. 10.1044/2022_AJSLP-22-00185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Feenaughty, L., Tjaden, K., & Sussman, J. (2014). Relationship between acoustic measures and judgments of intelligibility in Parkinson's disease: A within-speaker approach. Clinical Linguistics & Phonetics, 28(11), 857–878. 10.3109/02699206.2014.921839 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Fletcher, A. R., Risi, R. A., Wisler, A., & McAuliffe, M. J. (2019). Examining listener reaction time in the perceptual assessment of dysarthria. Folia Phoniatrica et Logopaedica, 71(5–6), 297–308. 10.1159/000499752 [DOI] [PubMed] [Google Scholar]
  22. Fletcher, A. R., Wisler, A. A., Gruver, E. R., & Borrie, S. A. (2022). Beyond speech intelligibility: Quantifying behavioral and perceived listening effort in response to dysarthric speech. Journal of Speech, Language, and Hearing Research, 65(11), 4060–4070. 10.1044/2022_JSLHR-22-00136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Galaz, Z., Mekyska, J., Mzourek, Z., Smekal, Z., Rektorova, I., Eliasova, I., Kostalova, M., Mrackova, M., & Berankova, D. (2016). Prosodic analysis of neutral, stress-modified and rhymed speech in patients with Parkinson's disease. Computer Methods and Programs in Biomedicine, 127, 301–317. 10.1016/j.cmpb.2015.12.011 [DOI] [PubMed] [Google Scholar]
  24. Goberman, A. M., & Elmer, L. W. (2005). Acoustic analysis of clear versus conversational speech in individuals with Parkinson disease. Journal of Communication Disorders, 38(3), 215–230. 10.1016/j.jcomdis.2004.10.001 [DOI] [PubMed] [Google Scholar]
  25. Harel, B. T., Cannizzaro, M. S., Cohen, H., Reilly, N., & Snyder, P. J. (2004). Acoustic characteristics of Parkinsonian speech: A potential biomarker of early disease progression and treatment. Journal of Neurolinguistics, 17(6), 439–453. 10.1016/j.jneuroling.2004.06.001 [DOI] [Google Scholar]
  26. Holmes, R. J., Oates, J. M., Phyland, D. J., & Hughes, A. J. (2000). Voice characteristics in the progression of Parkinson's disease. International Journal of Language & Communication Disorders, 35(3), 407–418. 10.1080/136828200410654 [DOI] [PubMed] [Google Scholar]
  27. Jaywant, A., & Pell, M. D. (2010). Listener impressions of speakers with Parkinson's disease. Journal of the International Neuropsychological Society, 16(1), 49–57. 10.1017/S1355617709990919 [DOI] [PubMed] [Google Scholar]
  28. Kent, R. D., Weismer, G., Kent, J. F., Vorperian, H. K., & Duffy, J. R. (1999). Acoustic studies of dysarthric speech: Methods, progress, and potential. Journal of Communication Disorders, 32(3), 141–186. 10.1016/S0021-9924(99)00004-0 [DOI] [PubMed] [Google Scholar]
  29. Kim, Y., Kent, R. D., & Weismer, G. (2011). An acoustic study of the relationships among neurologic disease, dysarthria type, and severity of dysarthria. Journal of Speech, Language, and Hearing Research, 54(2), 417–429. 10.1044/1092-4388(2010/10-0020) [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kovac, D., Mekyska, J., Aharonson, V., Harar, P., Galaz, Z., Rapcsak, S., Orozco-Arroyave, J. R., Brabenec, L., & Rektorova, I. (2024). Exploring digital speech biomarkers of hypokinetic dysarthria in a multilingual cohort. Biomedical Signal Processing and Control, 88(Pt. B), Article 105667. 10.1016/j.bspc.2023.105667 [DOI] [Google Scholar]
  31. Lam, J., & Tjaden, K. (2016). Clear speech variants: An acoustic study in Parkinson's disease. Journal of Speech, Language, and Hearing Research, 59(4), 631–646. 10.1044/2015_JSLHR-S-15-0216 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Lennes, M., Stevanovic, M., Aalto, D., & Palo, P. (2016). Comparing pitch distributions using Praat and R. Phonetician, 111(2), 35–53. [Google Scholar]
  33. Lieberman, P., & Michaels, S. B. (1962). Some aspects of fundamental frequency and envelope amplitude as related to the emotional content of speech. The Journal of the Acoustical Society of America, 34(7), 922–927. 10.1121/1.1918222 [DOI] [Google Scholar]
  34. Ma, A., Lau, K. K., & Thyagarajan, D. (2020). Voice changes in Parkinson's disease: What are they telling us? Journal of Clinical Neuroscience, 72, 1–7. 10.1016/j.jocn.2019.12.029 [DOI] [PubMed] [Google Scholar]
  35. Marcoux, K., & Ernestus, M. (2019). Pitch in native and non-native Lombard speech. In S. Calhoun, P. Escudero, M. Tabain, & P. Warren(Eds.), Proceedings of the 19th International Congress of Phonetic Sciences (ICPhS 2019) (pp. 2605–2609). Australasian Speech Science and Technology Association Inc., and International Phonetic Association. [Google Scholar]
  36. McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46. 10.1037/1082-989X.1.1.30 [DOI] [Google Scholar]
  37. Möbes, J., Joppich, G., Stiebritz, F., Dengler, R., & Schröder, C. (2008). Emotional speech in Parkinson's disease. Movement Disorders, 23(6), 824–829. 10.1002/mds.21940 [DOI] [PubMed] [Google Scholar]
  38. Oxenham, A. J., & Plack, C. J. (1997). A behavioral measure of basilar-membrane nonlinearity in listeners with normal and impaired hearing. The Journal of the Acoustical Society of America, 101(6), 3666–3675. 10.1121/1.418327 [DOI] [PubMed] [Google Scholar]
  39. Patel, R., Connaghan, K., Franco, D., Edsall, E., Forgit, D., Olsen, L., Ramage, L., Tyler, E., & Russell, S. (2013). “The Caterpillar”: A novel reading passage for assessment of motor speech disorders. American Journal of Speech-Language Pathology, 22(1), 1–9. 10.1044/1058-0360(2012/11-0134) [DOI] [PubMed] [Google Scholar]
  40. Pell, M. D., Cheang, H. S., & Leonard, C. L. (2006). The impact of Parkinson's disease on vocal-prosodic communication from the perspective of listeners. Brain and Language, 97(2), 123–134. 10.1016/j.bandl.2005.08.010 [DOI] [PubMed] [Google Scholar]
  41. Pitcairn, T. K., Clemie, S., Gray, J. M., & Pentland, B. (1990). Impressions of parkinsonian patients from their recorded voices. International Journal of Language & Communication Disorders, 25(1), 85–92. 10.3109/13682829009011965 [DOI] [PubMed] [Google Scholar]
  42. Ramig, L. O., Countryman, S., Thompson, L. L., & Horii, Y. (1995). Comparison of two forms of intensive speech treatment for Parkinson disease. Journal of Speech and Hearing Research, 38(6), 1232–1251. 10.1044/jshr.3806.1232 [DOI] [PubMed] [Google Scholar]
  43. Rodríguez-Pérez, P., Fraile, R., García-Escrig, M., Sáenz-Lechón, N., Gutiérrez-Arriola, J. M., & Osma-Ruiz, V. (2019). A transversal study of fundamental frequency contours in parkinsonian voices. Biomedical Signal Processing and Control, 51, 374–381. 10.1016/j.bspc.2019.02.021 [DOI] [Google Scholar]
  44. Rusz, J., Tykalova, T., Ramig, L. O., & Tripoliti, E. (2021). Guidelines for speech recording and acoustic analyses in dysarthrias of movement disorders. Movement Disorders, 36(4), 803–814. 10.1002/mds.28465 [DOI] [PubMed] [Google Scholar]
  45. Skodda, S., Grönheit, W., & Schlegel, U. (2011). Intonation and speech rate in Parkinson's disease: General and dynamic aspects and responsiveness to levodopa admission. Journal of Voice, 25(4), e199–e205. 10.1016/j.jvoice.2010.04.007 [DOI] [PubMed] [Google Scholar]
  46. Skodda, S., Rinsche, H., & Schlegel, U. (2009). Progression of dysprosody in Parkinson's disease over time—A longitudinal study. Movement Disorders, 24(5), 716–722. 10.1002/mds.22430 [DOI] [PubMed] [Google Scholar]
  47. Stipancic, K. L., Palmer, K. M., Rowe, H. P., Yunusova, Y., Berry, J. D., & Green, J. R. (2021). “You say severe, I say mild”: Toward an empirical classification of dysarthria severity. Journal of Speech, Language, and Hearing Research, 64(12), 4718–4735. 10.1044/2021_JSLHR-21-00197 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Strömbergsson, S. (2016). Today's most frequently used F0 estimation methods, and their accuracy in estimating male and female pitch in clean speech. In Interspeech 2016 (pp. 525–529). International Speech Communication Association. 10.21437/Interspeech.2016-240 [DOI] [Google Scholar]
  49. Talkin, D., & Kleijn, W. B. (1995). A Robust Algorithm for Pitch Tracking (RAPT). In W. B. Kleijn & K. K. Paliwal (Eds.), Speech coding and synthesis (pp. 495–518). Elsevier. [Google Scholar]
  50. Taylor, P. (1994). The rise/fall/connection model of intonation. Speech Communication, 15(1–2), 169–186. 10.1016/0167-6393(94)90050-7 [DOI] [Google Scholar]
  51. Traunmüller, H., & Eriksson, A. (1995). The perceptual evaluation of F excursions in speech as evidenced in liveliness estimations. The Journal of the Acoustical Society of America, 97(3), 1905–1915. 10.1121/1.412942 [DOI] [PubMed] [Google Scholar]
  52. Tykalova, T., Rusz, J., Cmejla, R., Ruzickova, H., & Ruzicka, E. (2014). Acoustic investigation of stress patterns in Parkinson's disease. Journal of Voice, 28(1), 129.e1–129.e8. 10.1016/j.jvoice.2013.07.001 [DOI] [PubMed] [Google Scholar]
  53. Van Der Bruggen, S., De Letter, M., & Rietveld, T. (2024). Effects of near-monotonous speech of persons with Parkinson's disease on listening effort and intelligibility. Clinical Linguistics & Phonetics, 38(10), 935–948. 10.1080/02699206.2023.2272032 [DOI] [PubMed] [Google Scholar]
  54. Verkhodanova, V. (2021). Acoustic change over time in speech of one bilingual individual with Parkinson's disease. OSF. 10.17605/OSF.IO/9BSQY [DOI]
  55. Vogel, A. P., Maruff, P., Snyder, P. J., & Mundt, J. C. (2009). Standardization of pitch-range settings in voice acoustic analysis. Behavior Research Methods, 41(2), 318–324. 10.3758/BRM.41.2.318 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Whitehill, T. L., & Wong, C. C. Y. (2006). Contributing factors to listener effort for dysarthric speech. Journal of Medical Speech-Language Pathology, 14(4), 335–342. [Google Scholar]
  57. Wilcox, R. R., & Rousselet, G. A. (2018). A guide to robust statistical methods in neuroscience. Current Protocols in Neuroscience, 82, 8.42.1–8.42.30. 10.1002/cpns.41 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The Praat scripts used in this study are available at: https://github.com/AnnaliseFletcher/Praat-Scripts-for-Assessing-Fundamental-Frequency. Our speech recordings are not publicly available to protect participants' confidentiality. However, anonymized data may be available from the corresponding author on reasonable request.


Articles from Journal of Speech, Language, and Hearing Research : JSLHR are provided here courtesy of American Speech-Language-Hearing Association

RESOURCES