Abstract
Introduction:
The purpose of the study was to assess acoustic measures of fundamental frequency (fo), standard deviation of fo (SD of fo), jitter%, shimmer%, noise-to-harmonic ratio (NHR), smoothed cepstral peak prominence (CPPS), and acoustic voice quality index (AVQI) analyzed through multiple Praat versions automatically by VoiceEvalU8 or manually by two raters. In addition, default settings to calculate CPPS in two Praat versions manually analyzed by two raters were compared to Maryn and Weenik20 procedures for CPPS automatically analyzed by VoiceEvalU8.
Methods:
Nineteen vocally healthy females used VoiceEvalU8 to record three 5-s sustained /a/ trials, the all voiced phrase “we were away a year ago,” and a 15-s speech sample twice a day for five consecutive days. Two raters manually completed acoustic analysis using different versions of Praat and compared that analysis to measures automatically generated through a version of Praat used by VoiceEvalU8. One-way analyses of variance were run for all acoustic measures with post-hoc testing by the Bonferroni method. For acoustic measures that demonstrated significant differences, intraclass correlation coefficients were conducted.
Results:
Results showed no significant differences across automatic and manual analysis for different versions of Praat for all acoustic measures during /a/, for fo, jitter%, shimmer%, and NHR during the phrase, for jitter%, shimmer%, NHR, and CPPS during speech, and for AVQI calculated from both sustained /a/ and the phrase. The default Praat settings for CPPS were not significantly different from the Maryn and Weenik20 procedures for sustained /a/ and speech. Significant differences were present for SD of fo and CPPS during the phrase and fo and SD of fo during speech. SD of fo and CPPS in the phrase were moderately correlated and fo and SD of fo during speech demonstrated good to excellent correlations across the different versions of Praat
Conclusions:
Acoustic measures analyzed through sustained /a/ and some of the acoustic measures during the phrase and speech were not different across multiple versions of Praat. Automatic analysis by VoiceEvalU8 produced similar mean values as compared to manual analysis by two raters. Even though SD of fo and CPPS in the phrase and fo and SD of fo in speech were different across the versions of Praat, the measures demonstrated moderate to excellent reliability.
Keywords: telepractice, telehealth, voice, acoustics, Praat
Introduction
According to the 2012 National Health Interview Survey, voice disorders are the most reported communication impairment1 with an estimated 1 in 13 adults in the United States having a voice disorder annually.2 A voice disorder is defined as “altered voice quality, pitch, loudness, or vocal effort that impairs communication … and/or affects quality of life.”3,p.3 The etiology of a voice disorder can be organic (i.e., a structural or neurological change that affects the respiratory, laryngeal, or vocal tract mechanisms) or functional (i.e., extensive or improper use of the vocal mechanism).4 Individuals who are suspected of having a voice disorder are assessed by a multidisciplinary team that includes an otolaryngologist and a speech-language pathologist (SLP).5,6 Otolaryngologists assess clients by collecting a case history, performing a physical examination, and visualizing the larynx using laryngoscopy to determine a diagnosis and the best course of treatment.3,6 SLPs assess clients by collecting a clinical history, acoustic, perceptual (including client reported outcome measures), and aerodynamic measures to determine the function of the vocal mechanism, limitations caused by the impairment, impact of the impairment on quality of life, and stimulability for voice therapy.4–7 Acoustic, perceptual, and aerodynamic measures are components of a comprehensive voice assessment.6 Acoustic measures determine vocal amplitude, frequency, and quality of the voice signal.4 Clients are asked to record sustained phonation and speech tasks. These recording are then analyzed by voice analysis software. Perceptual measures provide auditory-perceptual ratings of the client’s voice and the impact the voice problem has on the client’s quality of life. Aerodynamic measures capture the client’s ability to manage airflow with vocal fold phonation.4
To capture these measures before the COVID-19 pandemic, SLPs required clients to travel to their office to complete the voice assessment in-person. The in-person voice assessment session may take 1–2 hours to complete, which does not include data analysis and write-up of the report. Typically, only two “snapshots” of the client’s voice are provided (i.e., before and after voice rehabilitation). A voice assessment at only before and after voice rehabilitation does not provide an accurate representation of the effects of vocal loading, or vocal use, throughout a day of talking.5,8,9 A method is needed to collect acoustic, perceptual, and aerodynamic measures throughout a day of vocal loading that captures a realistic, functional picture of the client’s voice in their environment without requiring an in-person visit. A solution may involve smartphone application (app) technology. Thus, VoiceEvalU8, a smartphone/tablet app, Health Insurance Portability and Accountability Act (HIPAA)-compliant server, and web portal, was created to assess voice in the client’s environment that is easily accessible to clients and SLPs either in-person or through telepractice (i.e., audiology and speech-language pathology services delivered at a distance), meeting COVID-19 physical distancing recommendations and state-at-home orders.
During the COVID-19 pandemic, televoice evaluations tools were a necessity to continue comprehensive voice evaluations. VoiceEvalU8 provides the option to complete an evaluation either in-person or through synchronous and/or asynchronous telepractice. Synchronous use of VoiceEvalU8 involves the client, with the clinician present, completing a log session in real time through videoconferencing, whereas asynchronous use of VoiceEvalU8 implies that the client completes the log sessions on their own with no SLP present. Using VoiceEvalU8, it is possible to capture acoustic, perceptual, and aerodynamic measures before and after a day of vocal loading (see video demonstration of VoiceEvalU8).8–10 The measures are captured through sustained phonation, speech tasks, and survey questions on the client’s smartphone or tablet. Each log session takes 5 to 8 minutes to complete in the environment where the client normally communicates. Capturing data in a client’s naturalistic environment is recommended by the World Health Organization’s International Classification of Functioning, Disability, and Health.11,12 Clients download the VoiceEvalU8 app onto their smartphone or tablet and log in with a username provided by the SLP and a unique password created by the client. For each assigned day, a client will complete one session in the morning and one session in the evening. The app then sends the recorded data to the HIPAA-compliant server where it is encrypted, stored, and the acoustic measures are analyzed via Praat algorithms, a free voice analysis software program on the world wide web.13 SLPs are able to create log sessions, add clients as app users, and access all the results in the VoiceEvalU8 web portal.
Video Demonstration of VoiceEvalU8
To capture acoustic measures, VoiceEvalU8 prompts a user to record three 5-s trials of sustained /a/, the phrase “we were away a year ago,” and a 15-s speech sample during each log session. These tasks were adapted from the Consensus Auditory Perceptual Evaluation of Voice (CAPE-V).8 The CAPE-V is “a clinical and research tool developed to promote a standardized approach to evaluating and documenting auditory-perceptual judgments of voice quality.”14,p.124 VoiceEvalU8 uses the microphone on the client’s smartphone or tablet to record each trial at a 44,000 Hz sampling rate. Results from a study by Grillo et al5 found that no significant differences were present for acoustic measures captured simultaneously across multiple trials by a head-mounted microphone, two versions of an Apple iPhone, and an Android smartphone. The microphones on the smartphones agreed with the typically used head-mounted microphone for acoustic measure analysis. VoiceEvalU8 utilizes algorithms from Praat version 6.0.19 June 2016 to automatically analyze the acoustic measures on the server as .wav files, streamlining the analysis process for the SLP. The Praat source code and algorithms are available from the Praat website at https://www.fon.hum.uva.nl/praat/.
The literature has compared Praat to Analysis of Dysphonia in Speech and Voice (ADSV) (Kay Elemetrics Corporation, Lincoln Park, NJ, USA) with a focus on smoothed cepstral peak prominence (CPPS) in people with and without voice disorders.15,16 Cepstral-based measures have previously been found to be the strongest predictor of overall voice severity in dysphonic speakers, above and beyond other acoustic measures17,18 and cepstral-based measures were also recently proposed as one objective acoustic correlate of voice quality that SLPs should use at minimum when evaluating patients with voice disorders.19 Watts et al15 investigated Flemish and English sustained /a/ and speech in one or two sentences across dysphonic and non-dysphonic voices for CPPS from Praat (version not specified) and ADSV. The sentence in English was the all voiced, “we were away a year ago.” From the description in the Methods section, it was not clear if the two Flemish sentences were all voiced or not. The results indicated that even though the CPPS values were different due to differences in the algorithms for Praat (i.e., using Maryn and Weenik20 procedures) and ADSV, the values were highly correlated. The results were divided by Flemish and English not by presence of dysphonia; therefore, the mean and standard deviation values reported in Watts et al15 includes both people with and without dysphonia. Within each language, it would have been interesting to analyze and report CPPS values by non-dysphonic and dysphonic participants. Sauder et al16 investigated the ability of CPPS to predict voice disorder diagnosis for both Praat and ADSV and the relationship of CPPS between Praat and ADSV. Praat version 6.0.17 was used to analyze the second sentence of the Rainbow Passage, “The rainbow is a division of white light into many beautiful colors.” Both CPPS calculated from Praat and ADSV were highly correlated, while CPPS derived from Praat was uniquely predictive of voice disorder diagnosis above and beyond CPPS from ADSV. The default settings of Praat were used to calculate CPPS, not the Maryn and Weenik20 procedures.
In addition, the literature has reported that the Multi-Dimensional Voice Program (MDVP) (Kay Elemetrics Corporation, Lincoln Park, NJ, USA) and Praat (version not specified) produce equivalent mean fundamental frequency (fo), whereas jitter and shimmer differ significantly.21–24 Conversely, Oguz et al25 analyzed 47 normal and pathological voices with MDVP and Praat (version not specified) and found no significant differences in shimmer calculated by the two programs. The different acoustic parameters produced by MDVP and Praat are attributable to differences between the algorithms used by the two programs.23
To our knowledge, the literature has not evaluated multiple versions of Praat. Praat and its algorithms are continually updated with new versions posted on the Praat website. Clinicians may not be aware of a new version and will continue to use the older version that was originally downloaded onto their computer; therefore, it is important to assess acoustic measures across multiple versions of Praat. In addition, automatic analysis of acoustic measures through Praat offers a user friendly and efficient alternative to manual analysis. App technology linked to the automatic analysis provides options for clients to complete a voice evaluation at a distance. VoiceEvalU8 provides the automatic analysis through Praat algorithms and options to complete a voice evaluation either in-person or through synchronous and asynchronous telepractice. The purpose of the current study was to investigate if differences exist between acoustic measures manually acquired by two raters using versions 6.0.46 from January 2019, 6.1.05 from October 2019, and 6.0.32 from September 2017 compared to acoustic measures automatically generated by VoiceEvalU8 using version 6.0.19 from June 2016. In addition, default settings in each version of Praat were used by the two raters to manually calculate CPPS, which were compared to the Maryn and Weenik20 procedures for calculating CPPS used in Praat version 6.0.19 by VoiceEvalU8. Default settings for CPPS were chosen to represent the clinical environment by the manual analysis of the two raters without requiring a computer script or adaptation of the settings, which may be cumbersome and difficult for routine clinical use. VoiceEvalU8 calculates CPPS automatically through the Maryn and Weenik20 procedures, which is different from the default settings. A comparison of the differences in the default settings and the Maryn and Weenik20 procedures was necessary.
Methods
Participants
Nineteen vocally healthy females participated in this study, which was approved by West Chester University’s Institutional Review Board. Before all study procedures, participants reviewed and signed a consent form. All participants who provided consent met the following inclusion criteria; American English-speaking females between the ages of 20–30, owner of either an android or iOS smartphone, and vocally healthy as determined by no current voice complaints and no abnormal voice patterns perceptually judged by the first author. The participants used the VoiceEvalU8 app twice a day for five consecutive days. A session was completed in the morning (i.e., between 6–11 a.m.) and in the evening (i.e., between 4–11 p.m.) on each of the days. During each of the sessions, the participants were prompted to produce three 5-s trials of sustained /a/, the phrase “we were away a year ago”, and a 15-s speech sample. The participants used the VoiceEvalU8 app on their iOS or Android smartphone to record acoustic .wav files at a 44,000 Hz sampling rate. In all utterances for the acoustic measures, the participants were instructed to be in a quiet room, measure mouth to microphone distance of 4cm with the plastic stick, and to use their everyday speaking voice. The acoustic files from Day 1, Day 4, and Day 5 in the morning and in the evening were analyzed for this study. Day 2 was excluded from analysis because of a high prevalence of participants not completing their log sessions that day. Day 3 was not analyzed because the server that day did not run the files through the Praat algorithms due to a corrupt file.
Praat Analysis
Two raters, R1 and R2, manually completed the acoustic analysis using Praat. Both raters individually downloaded each of the nineteen participants’ acoustic .wav files from the VoiceEvalU8 web portal. Each rater downloaded and analyzed 702 files. The acoustic measures that were analyzed in this study were fo, standard deviation of the fo (SD of fo), jitter%, shimmer%, CPPS, noise-to-harmonic ratio (NHR), and acoustic voice quality index (AVQI) (see Grillo et al5 for a description of each measure). The raters used different versions of the Praat software to analyze the acoustic files. R1 used version 6.0.46 from January 2019 for all measures, except CPPS. A newer version of Praat, 6.1.05 from October 2019, was used for CPPS due to the presence of a bug in the calculation of CPPS found in version 6.0.46. R2 used version 6.0.32 from September 2017.
The raters manually uploaded each .wav file into Praat to analyze the different acoustic measures. fo, SD of fo, jitter%, shimmer%, and NHR were analyzed using the Praat voice report feature. After a .wav file was uploaded, the raters viewed the file as a waveform (i.e., visual display of sound by frequency and time) and a wideband spectrogram (i.e., frequency and amplitude by time). For the three 5 s sustained /a/ trials, the middle 4-s portion of the vowel was analyzed, trimming the first and last 0.5-ms to eliminate on- and off-set of phonation. For the phrase and speech, only the speech portion was analyzed, deleting all silent intervals at the beginning and end of the recording. The raters then highlighted the entire area of voicing on the waveform and spectrogram. The voice report was then selected under the “Pulses” tab, providing an acoustic analysis of fo, SD of fo, jitter%, shimmer%, and NHR. CPPS and AVQI were not provided in the voice report and required different methods for acoustic analysis. To analyze CPPS, a PowerCepstrogram file was created from the original acoustic .wav file by selecting “To PowerCepstrogram” under the “Analysis Periodicity” tab. From the PowerCepstrogram file, CPPS was calculated by selecting “Get CPPS” under the “Query” tab. The default settings of Praat for calculation of CPPS were used for each of the versions utilized by the two raters. Default settings were used to represent methods used in the typical clinical environment and to also compare the typical default Praat settings for CPPS to the CPPS procedures described in Maryn and Weenik,20 which VoiceEvalU8 uses for automatic analysis of CPPS. For AVQI, /a/ trial 2 and the phrase “we were away a year ago” were utilized. The /a/ trial 2 and the phrase files were renamed “sv” and “cs,” respectively, before being uploaded to Praat. After both files were uploaded, the Praat script created by Maryn and Weenik20 was utilized to calculate AVQI. The results from both raters were added into Excel spreadsheets for each of the measures across sustained /a/, the phrase “we were away a year ago,” and the 15-s speech sample.
VoiceEvalU8 Analysis
VoiceEvalU8 uses Praat algorithms, version 6.0.19 from June 2016, to automatically analyze the acoustic measures. For CPPS and AVQI, the procedures and methods described by Maryn and Weenink20 were used. These measures are analyzed through automation scripts and stored on the VoiceEvalU8 server. The automation script for the three 5-s /a/ trials trims the first and last 0.5 ms from the recording, analyzing only the middle portion of the vowel (4- s) eliminating on- and off-set of phonation. In addition, for the phrase and the 15-s speech sample, the automation script trims the beginning and ending of the recording to include only speech. For each of the nineteen participants, the acoustic measures of fo, SD of fo, jitter%, shimmer%, CPPS, NHR, and AVQI were recorded from the VoicEvalU8 web portal by R1 or R2. These measures were then added to the corresponding Excel spreadsheets.
Statistical Analysis
IBM SPSS version 24 was used to complete statistical analysis. The dependent variables (DV) were the seven acoustic measures calculated by VoiceEvalU8, R1, and R2 (i.e., fo, SD of fo, jitter%, shimmer%, NHR, CPPS, and AVQI). The independent variable (IV) was analysis method (i.e., automatic by VoiceEvalU8, manual by R1 and R2, and all with different versions of Praat). A majority of the DVs were not normally distributed; therefore, the non-parametric Kruskal-Wallis one-way analysis of variance (ANOVA) was conducted to determine statistically significant differences for all seven DVs with a p-value of <0.05. Means and standard deviations of the acoustic measures were provided for descriptive analysis. If the one-way ANOVA was significant for a DV, then post-hoc testing was completed by the Bonferroni method to determine where the difference occurred between these three comparisons; 1) VoiceEvalU8 (Praat version 6.0.19 and Maryn and Weenink20 methods for CPPS and AVQI) vs. R1 (Praat version 6.1.05 for CPPS, version 6.0.46 for the other acoustic measures, and Maryn and Weenik20 for AVQI), 2) VoiceEvalU8 vs. R2 (Praat version 6.0.32 for all acoustic measures and Maryn and Weenik20 for AVQI), and 3) R1 vs. R2. Considering the three comparisons for post hoc testing, the p-value was adjusted for the Bonferroni method, which happens automatically in version 24 of SPSS. For acoustic measures that demonstrated significant differences, intraclass correlation coefficients (ICC) with a two-way mixed model and absolute agreement were conducted to determine the reliability among the IV of analysis method with p<0.05 to determine significance. Although there are no standards to interpret ICCs, ICCs below 0.5 have been suggested as indicative of poor reliability, ICCs of 0.5–0.75 as moderately reliable, ICCs of 0.75–0.9 as having good reliability, and ICCs above 0.9 as having excellent reliability.26
Results
Sustained /a/
For all three 5-s trials of sustained /a/, the IV of analysis method was not significant for fo, SD of fo, jitter%, shimmer%, NHR, and CPPS. Table 1 includes means, standard deviations, and p-values for the acoustic measures during the three /a/ trials. The different versions of Praat that were used automatically by VoiceEvalU8 and manually by the two raters yielded acoustic measures that were not significantly different for all three sustained /a/ trials. In addition, CPPS mean values for the default settings in the Praat versions used by R1 and R2 and the automatic analysis of the Maryn and Weenik20 procedures in VoiceEvalU8 demonstrated no differences during the three sustained /a/ trials.
Table 1.
Acoustic Females | VoiceEvalU8 | R1 | R2 | p-values | |||
---|---|---|---|---|---|---|---|
Trial 1 | M | SD | M | SD | M | SD | p |
fo (Hz) | 214 | 3.6 | 217 | 3.6 | 216 | 3.6 | 0.92 |
SD of fo (Hz) | 10.5 | 1.5 | 10.1 | 1.5 | 11.5 | 1.5 | 0.63 |
Jitter% | 0.60 | 0.04 | 0.56 | 0.04 | 0.58 | 0.04 | 0.98 |
Shimmer% | 4.84 | 0.32 | 4.91 | 0.32 | 5.03 | 0.32 | 0.94 |
NHR | 0.04 | 0.01 | 0.04 | 0.01 | 0.04 | 0.01 | 0.98 |
CPPS (dB) | 13.1 | 0.28 | 12.6 | 0.28 | 12.4 | 0.28 | 0.11 |
Trial 2 | |||||||
fo (Hz) | 212 | 3.8 | 217 | 3.8 | 217 | 3.8 | 0.89 |
SD of fo (Hz) | 12.5 | 1.6 | 11.3 | 1.6 | 12.9 | 1.6 | 0.88 |
Jitter% | 0.60 | 0.05 | 0.53 | 0.05 | 0.59 | 0.05 | 0.96 |
Shimmer% | 5.03 | 0.36 | 5.03 | 0.36 | 5.23 | 0.36 | 0.98 |
NHR | 0.04 | 0.01 | 0.03 | 0.01 | 0.04 | 0.01 | 0.95 |
CPPS (dB) | 13.4 | 0.26 | 12.9 | 0.26 | 12.8 | 0.26 | 0.09 |
Trial 3 | |||||||
fo (Hz) | 212 | 4.2 | 216 | 4.2 | 216 | 4.2 | 0.88 |
SD of fo (Hz) | 12.4 | 1.6 | 12.2 | 1.6 | 12.6 | 1.6 | 0.95 |
Jitter% | 0.59 | 0.04 | 0.53 | 0.04 | 0.53 | 0.04 | 0.89 |
Shimmer% | 5.05 | 0.35 | 4.74 | 0.35 | 4.74 | 0.35 | 0.96 |
NHR | 0.04 | 0.01 | 0.03 | 0.01 | 0.03 | 0.01 | 0.44 |
CPPS (dB) | 13.4 | 0.25 | 13.3 | 0.25 | 13.1 | 0.25 | 0.48 |
Note. N=19. Analysis of variance (ANOVA); Rater 1 (R1) and Rater 2 (R2).
p<0.05
Phrase
For the phrase “we were away a year ago,” the IV of analysis method was not significant for fo, jitter%, shimmer%, NHR, and AVQI. Table 2 includes means, standard deviations, and p-values for the acoustic measures during the phrase and speech. The phrase is one of the speech tasks utilized to calculate AVQI; therefore, the measure was included in Table 2. The IV of analysis method for SD of fo and CPPS were significant in the phrase. Because of significance for SD of fo and CPPS, post hoc pairwise comparisons using the Bonferroni method were used to further assess VoiceEvalU8 vs. R1, VoiceEvalU8 vs. R2, and R1 vs. R2 (see Table 3). For SD of fo, VoiceEvalU8 and R1 were equivalent, but there were significant differences between VoiceEvalU8 vs. R2 (p < 0.00) and R1 vs. R2 (p < 0.00). From post-hoc testing, the difference in SD of fo was driven by the version of Praat used by R2 (6.0.32 from September 2017). Praat version 6.0.19 from June 2016 used by VoiceEvalU8 was equivalent to Praat version 6.0.46 from January 2019 used by R1 in the calculation of SD of fo during the phrase. For CPPS, R1 and R2 were the same, but there were significant differences between VoiceEvalU8 vs. R1 (p < 0.00) and VoiceEvalU8 vs. R2 (p < 0.00). From post-hoc testing, the differences in mean CPPS during the phrase was led by Praat version 6.0.19 from June 2016 used by VoiceEvalU8 with the Maryn and Weenik20 procedures. The default settings for CPPS in the Praat versions used manually by R1 (6.1.05 from October 2019) and R2 (6.0.32 from September 2017) produced similar mean CPPS values.
Table 2.
Acoustic Females | VoiceEvalU8 | R1 | R2 | p-values | |||
---|---|---|---|---|---|---|---|
Phrase | M | SD | M | SD | M | SD | p |
fo (Hz) | 195 | 2.9 | 204 | 2.9 | 201 | 2.9 | 0.11 |
SD of fo (Hz) | 43.2 | 1.8 | 52.1 | 1.8 | 42.5 | 1.8 | <0.00* |
Jitter% | 1.44 | 0.05 | 1.50 | 0.05 | 1.44 | 0.05 | 0.54 |
Shimmer% | 7.7 | 0.27 | 7.8 | 0.27 | 7.7 | 0.27 | 0.85 |
NHR | 0.13 | 0.01 | 0.12 | 0.01 | 0.12 | 0.01 | 0.41 |
CPPS (dB) | 13.1 | 0.16 | 9.1 | 0.16 | 8.9 | 0.16 | <0.00* |
AVQI | 2.89 | 0.24 | 2.89 | 0.24 | 2.89 | 0.24 | 1.00 |
Speech | |||||||
fo (Hz) | 190 | 3.39 | 205 | 3.39 | 199 | 3.39 | 0.04* |
SD of fo (Hz) | 44.9 | 1.8 | 66.6 | 1.8 | 52.3 | 1.8 | <0.00* |
Jitter% | 1.8 | 0.05 | 1.9 | 0.05 | 1.8 | 0.05 | 0.86 |
Shimmer% | 8.8 | 0.26 | 9.0 | 0.26 | 8.9 | 0.26 | 0.94 |
NHR | 0.16 | 0.01 | 0.15 | 0.01 | 0.15 | 0.01 | 0.14 |
CPPS (dB) | 8.2 | 0.12 | 8.3 | 0.12 | 8.2 | 0.12 | 0.53 |
Note. N=19. Analysis of variance (ANOVA); Rater 1 (R1) and Rater 2 (R2).
p<0.05
Table 3.
Phrase | VoiceEvalU8 vs. R1 | VoiceEvalU8 vs. R2 | R1 vs. R2 | Correlations |
---|---|---|---|---|
SD of fo | p = 1.00 | p = 0.002* | p = 0.001* | ICC = 0.75, p < 0.00* |
CPPS | p < 0.00* | p < 0.00* | p = 1.00 | ICC = 0.65, p < 0.00* |
Speech | ||||
f o | p = 0.01* | p = 0.25 | p = 0.66 | ICC = 0.97, p < 0.00* |
SD of fo | p = 0.01* | p < 0.00* | p < 0.00* | ICC = 0.90, p < 0.00* |
Note. N=19. Rater 1 (R1) and Rater 2 (R2).
p<0.05
Speech
For the 15-s speech sample, the IV of analysis method was not significant for jitter%, shimmer%, NHR, and CPPS (see Table 2). The IV of analysis method for fo and SD of fo were significant in the 15-s speech sample. Due to the significance for fo and SD of fo, post hoc pairwise comparisons using the Bonferroni method were used to further assess VoiceEvalU8 vs. R1, VoiceEvalU8 vs. R2, and R1 vs. R2 (see Table 3). For fo, there was only a significant difference between VoiceEvalU8 and R1 (p = 0.01) indicating that R1’s version of Praat (6.0.46 from January 2019) and VoiceEvalU8’s version of Praat (6.0.19 from June 2016) were responsible for the difference rather than the Praat version used by R2 (6.0.32 from September 2017). For SD of fo, there were significant differences between all three comparisons (p = 0.01 for VoiceEvalU8 vs. R1 and p < 0.00 for VoiceEvalU8 vs. R1 and R1 vs. R2). All three Praat versions produced different mean SD of fo values in speech.
Correlation Analysis
ICCs were conducted for SD of fo and CPPS in the phrase and fo and SD of fo in the 15-s speech sample due to significant differences (p<0.05) in the one-way ANOVAs. The results of the correlation analyses presented in Table 3 show that SD of fo and CPPS in the phrase were statistically significant with moderate correlations (ICC = 0.75, p < 0.00 for SD of fo and ICC = 0.65, p < 0.00 for CPPS). fo and SD of fo in speech had good to excellent statistically significant correlations (ICC = 0.97, p < 0.00 for fo and ICC = 0.90, p < 0.00 for SD of fo). In general, the results suggest that there was moderate to excellent reliability for SD of fo and CPPS in the phrase and for fo and SD of fo in speech across the different versions of Praat.
Discussion
The IV of analysis method for all acoustic measures (i.e., fo, SD of fo, jitter%, shimmer%, NHR, and CPPS) were not significant in all three 5-s trials of sustained /a/. There was also no significance by analysis method for fo, jitter%, shimmer%, NHR, and AVQI in the phrase “we were away a year ago” and for jitter%, shimmer%, NHR, and CPPS in the 15-s speech sample. These results indicate that the automatic and manual procedures of using multiple versions of Praat were similar in the calculation of all acoustic measures during sustained phonation and several of the acoustic measures during the phrase and speech. For the phrase, the IV of analysis method was significant for SD of fo, and CPPS. For phrase SD of fo, there were statistically significant differences between VoiceEvalU8 vs. R2 and R1 vs. R2, but not between VoiceEvalU8 vs. R1. For phrase CPPS, there were statistically significant differences between VoiceEvalU8 vs. R1 and VoiceEvalU8 vs. R2, but not between R1 vs. R2. There was a moderate significant correlation between phrase SD of fo and phrase CPPS for the multiple versions of Praat calculated automatically by VoiceEvalU8 or manually by R1 and R2. For speech, the IV of analysis method was significant for fo and SD of fo. For speech fo, there was a statistically significant difference between VoiceEvalU8 vs. R1, but not between VoiceEvalU8 vs. R2 and R1 vs. R2. For speech SD of fo, there were statistically significant differences between all three points of comparison (i.e., VoiceEvalU8 vs. R1, VoiceEvalU8 vs. R2, and R1 vs. R2). ICCs demonstrated good to excellent significant correlations for speech fo and speech SD of fo across the multiple versions of Praat calculated either automatically by VoiceEvalU8 or manually by R1 and R2.
VoiceEvalU8 uses automation scripts on the VoiceEvalU8 server to run analysis through Praat algorithms version 6.0.19 from June 2016 and the Maryn and Weenik20 procedures for CPPS and AVQI. R1 used manual analysis from Praat version 6.1.05 for CPPS, version 6.0.46 for the other acoustic measures, and Maryn and Weenik20 procedures for AVQI. R2 used manual analysis from Praat version 6.0.32 for all acoustic measures and Maryn and Weenik20 procedures for AVQI. Both R1 and R2 used default settings in Praat to calculate CPPS rather than the Maryn and Weenik20 procedures. All the acoustic measures (i.e., fo, SD of fo, jitter%, shimmer%, NHR, and CPPS) during the three 5-s sustained /a/ trials were similar across automatic and manual analysis of multiple versions of Praat. This finding demonstrates that when analyzing client’s voices during sustained /a/, the acoustic measures will be similar across the multiple versions of Praat assessed in this study; therefore, comparisons can be made to the mean values reported in a clinical report or the literature. Clinicians and researchers should report the version of Praat used and the specific steps for analysis.
In Brockmann et al,27 female participants produced sustained /a/ for 5-s with the middle portion of the vowel being used for analysis. The participants spoke Swiss German or German and the average age was 28 years with no voice disorders. The sustained /a/s were analyzed via Praat (version not specified). The mean fo was 214 Hz. The mean fo in the current study for young American English-speaking females with normal voices between the ages of 20–30 ranged from 212–217 Hz during the three 5-s /a/ trials across multiple versions of Praat. Grillo et al5 analyzed sustained /a/ across MDVP and Praat (version not specified) in young vocally healthy females who spoke American English from smartphones and a head mounted microphone. The mean fo values during sustained /a/ calculated through Praat ranged from 216–219 Hz. The mean fo values across all three studies were consistent. The range of the mean values for jitter% (0.53–0.60), shimmer% (4.47–5.23), and NHR (0.03–0.04) during sustained /a/ in the current study was consistent with jitter% (0.39–0.58), shimmer% (2.29–5.29), and NHR (0.01–0.05) in Grillo et al.5 The range of SD of fo during /a/ (10.1–12.9) in the current study was different from the range reported in Grillo et al5 (6.29–7.29). Perhaps the versions of Praat used in the current study and Grillo et al5 were responsible for the differences because the gender, age range, and language spoken did not change. CPPS during sustained /a/ in the current study ranged from 12.4–13.4 dB and there were no significant differences between the default settings of the two versions of Praat used by R1 and R2 and the Maryn and Weenik20 procedures used by VoiceEvalU8. Other studies have reported mean CPPS values during sustained /a/. Watts et al15 reported a mean CPPS of 22.6 dB, which included men, women, dysphonic, and non-dysphonic speakers using Maryn and Weenik20 procedures. Brockmann-Bauser et al28 reported a mean CPPS of 16 dB using Praat (version 5.4.1.4) for vocally healthy women at a comfortable loudness level using Maryn and Weenik20 procedures. Phadke et al29 reported mean CPPS values of 13.6 dB for vocally healthy female teachers using Praat (version 5.4.05) and Maryn and Weenik20 procedures. The CPPS mean values during sustained /a/ reported in Brockmann-Bauser et al28 and Phadke et al29 are consistent with the range reported in the current study. Brockmann-Bauser et al28 and Phadke et al29 used only vocally healthy females similar to the current study, while Watts et al15 included men, women, dysphonic, and non-dysphonic voice users. The differences in the mean values could be due to the variability of participants across the studies from vocally healthy females in Brockmann-Bauser et al28, Phadke et al29, and the current study to dysphonic and non-dysphonic men and women in Watts et al.15
For the phrase “we were away a year ago,” jitter%, shimmer%, and NHR were similar across the automatic and manual versions of Praat. To our knowledge, there is no other literature that reports jitter%, shimmer%, and NHR from Praat in the all voiced phrase “we were away a year ago.” Because the phrase has all voiced phonemes, perhaps it offers an alternative to sustained vowels representing a more functional view of the client’s phonation pattern in speech while maintaining voicing to assess time-based measures. AVQI which involves both sustained /a/ and the phrase was also similar across the versions of Praat in the current study. Both the automatic version by VoiceEvalU8 and the manual versions by R1 and R2 used the Maryn and Weenik20 procedures. The average AVQI value reported in the current study (mean of 2.89 with standard deviation of 0.24) was consistent with normative female data in the literature (mean of 2.3 with standard deviation of 0.70).30
For a 15-s speech sample, jitter%, shimmer%, NHR, and CPPS were similar across the automatic and manual versions of Praat. We could not find other studies that assessed jitter%, shimmer%, and NHR in speech; therefore, we were unable to make comparisons. A recent manuscript accepted for publication has demonstrated that the voiced phrase “we were away a year ago” and a 15-s speech sample were more successful in showing voice changes from pre to post in the acoustic measures of NHR in the phrase and speech and jitter% in speech as compared to sustained /a/ in vocally healthy females.31 Based on the recent work,31 it may be beneficial to use NHR and jitter% in the voiced phrase and speech, but more research is needed. For speech CPPS, Phadke et al29 reported mean CPPS values of 10.4 dB with a standard deviation of 1.5 at a comfortable pitch and loudness for vocally healthy female teachers during speech with no sibilants. Phadke et al29 used Maryn and Weenik20 procedures to calculate CPPS. The current study demonstrated CPPS values (mean range of 8.2–8.3 dB with a standard deviation of 0.12) that were similar to Phadke et al.29 In contrast, Sauder et al16 analyzed the second sentence in the Rainbow Passage, “The rainbow is a division of white light into many beautiful colors” in vocally healthy females and reported a CPPS mean of 20.11 dB with a standard deviation of 1.27. Sauder et al16 used the default settings in Praat version 6.0.17 to calculate CPPS rather than the Maryn and Weenik20 procedures. These examples highlight the need for clinicians and researchers to describe CPPS calculation methods because the CPPS values vary depending upon the method used.
CPPS in the phrase was statistically different across the multiple versions of Praat in the current study. For CPPS in the phrase, we are aware of one other study that reported CPPS values using Maryn and Weenik20 procedures during the phrase “we were away a year ago.”15 Watts et al15 reported a mean CPPS value in the phrase of 20.07 dB. The mean reported in Watts et al15 included men, women, non-dysphonic, and dysphonic participants. The mean phrase CPPS in the current study ranged from 8.9–13.1 dB. Both the current study and Watts et al15 used the same Maryn and Weenik20 procedures. Maryn and Weenik20 reported CPPS values from the mid-portion of /a/ and the voiced phonemes from a Dutch phonetically balanced text. The mean CPPS was 11.66 dB with a standard deviation of 2.68 in Maryn and Weenik20 better approximating the range of mean CPPS values during the phrase in the current study (mean range from 8.9–13.1 dB with a standard deviation of 0.16). The Praat version used by VoiceEvalU8 was responsible for the differences in phrase CPPS in the current study. Perhaps the difference was related to something unique within the Praat source code and algorithms used by VoiceEvalU8 that was apparent only in phrase CPPS and not for sustained /a/ and speech. In addition, perhaps the methods used to calculate CPPS in the default Praat settings compared to the Maryn and Weenik20 procedures was responsible for the difference in phrase CPPS values. In the current study, even though the mean CPPS phrase values were different from the automatic analysis used by VoiceEvalU8 with the Maryn and Weenik20 procedures as compared to the default CPPS settings in both the manual versions of Praat used by R1 and R2, the values were moderately and significantly correlated among the versions. Therefore, the pattern achieved across the versions share the same trajectory not the same absolute values. Other literature has also reported strong and significant correlations for CPPS when comparing software programs of Praat and ADSV.15,16
SD of fo in the phrase and fo and SD of fo in speech were different across the automatic and manual versions of Praat. For phrase SD of fo, the manual version used by R2 (6.0.32) was responsible for the difference. Speech fo calculated from R1’s Praat version (6.0.46) and from VoiceEvalU8’s Praat version (6.0.19) appeared to be the driving force behind the differences, rather than R2’s Praat version (6.0.32). For speech SD of fo, all the Praat versions used by VoiceEvalU8, R1, and R2 produced different values. Perhaps the differences for these measures were related to changes in the Praat algorithms or source code, however, the three sustained /a/ trials did not detect differences and the same algorithms were used. The inclusion of other phonemes beyond the vowel /a/ may have also contributed to the differences. In addition, even though the mean values were different for phrase SD of fo, speech fo, and speech SD of fo, there was moderate to excellent reliability among the different versions of Praat as indicated by the ICCs. For assessment of voice therapy outcomes, if clinicians use the same Praat version for a client at pre-treatment, during treatment, and post-treatment, then it is more likely that the change achieved in the measures is from the treatment rather than from using a different version of Praat. When reporting and using normative data, clinicians and researchers need to describe the methods used for data analysis and indicate the version of Praat used.
Conclusions
Results from this study provide evidence that acoustic measures of fo, SD of fo, jitter%, shimmer%, NHR, and CPPS calculated from sustained /a/ were similar across multiple versions of Praat either calculated automatically (i.e., VoiceEvalU8) or manually (i.e., Rate 1 and Rater 2). AVQI, calculated from a combination of sustained /a/ and the phrase “we were away a year ago” according to Maryn and Weenik20 procedures, was also similar across the automatic (VoiceEvalU8) and manual (R1 and R2) Praat versions. There were differences in mean values for SD of fo and CPPS in the phrase across multiple versions of Praat, while fo, jitter%, shimmer%, and NHR in the phrase were similar across Praat versions. Phrase SD of fo and phrase CPPS were significantly correlated indicating moderate reliability across the Praat versions even if there were differences in the values. In speech, there were differences in mean values for fo and SD of fo, while jitter%, shimmer%, NHR, and CPPS were similar across Praat versions. Speech fo and speech SD of fo were significantly correlated indicating good to excellent reliability among the Praat versions. For both CPPS in sustained /a/ and speech, the default CPPS settings in the different Praat versions and methods described by Maryn and Weenik20 to calculate CPPS produced similar mean CPPS values. However, phrase CPPS was different between the default settings in Praat and the Maryn and Weenik20 procedures, but demonstrated moderate reliability between the analysis methods. VoiceEvalU8 provides an automatic option for the calculation of acoustic measures using Praat source code and algorithms that allows clinicians and researchers to complete televoice evaluations either in-person or through synchronous and asynchronous telepractice.
Acknowledgements
The work described in this publication was supported by the National Institute on Deafness and Other Communication Disorders, R15DC014566. The authors would like to acknowledge Katherine Gale, for her work analyzing the data for this study and presenting the findings at state and national conferences. Aspects of the manuscript were presented at the American Speech-Language-Hearing Association (ASHA) convention in Orlando, FL, November 2019. Katherine Gale and Jeremy Wolfberg received the Elizabeth Grillo Education and Travel Fund Award in November 2019 for presenting the work described in this manuscript at the ASHA convention.
Financial Disclosures
Jeremy Wolfberg received the Elizabeth Grillo Education and Travel Fund Award in November 2019 for presenting the work described in this manuscript at the American Speech-Language-Hearing Association convention in Orlando, FL.
Elizabeth Grillo has the following financial disclosures: salary and benefit support from West Chester University as an employee, royalties from Northern Speech Services for online continuing education courses, grant support from the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health (NIH) R15DC014566 and the National Loan Repayment Program, and subscription fees from VoiceEvalU8, LLC.
References
- 1.Morris MA, Meier SK, Griffin JM, Branda ME, & Phelan SM Prevalence and etiologies of adult communication disabilities in the United States: Results from the 2012 National Health Interview Survey. Disability and Health Journal, (2016), 9(1), 140–144. 10.1016/j.dhjo.2015.07.004 [DOI] [PubMed] [Google Scholar]
- 2.Bhattacharyya N. The prevalence of voice problems among adults in the United States. Laryngoscope, (2014), 124, 2359–2362. 10.1002/lary.24740 [DOI] [PubMed] [Google Scholar]
- 3.Stachler RJ, Francis DO, Schwartz SR, Damask CC, Digoy GP, Krouse HJ, … Nnacheta LC Clinical Practice Guideline: Hoarseness (Dysphonia) (Update). Otolaryngology–Head and Neck Surgery, (2018), 158(1)(suppl.), 1–42. 10.1177/0194599817751030 [DOI] [PubMed] [Google Scholar]
- 4.American Speech-Language-Hearing Association. Voice Disorders. (Practice Portal) www.asha.org/Practice-Portal/Clinical-Topics/Voice-Disorders/. (Accessed January 15, 2020). (n.d.).
- 5.Grillo EU, Brosious JN, Sorrell SL, & Anand S. Influence of smartphones and software on acoustic voice measures. International Journal of Telerehabilitation, (2016), 8(2), 9–14. doi: 10.5195/ijt.2016.6202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Roy N, Barkmeier-Kraemer J, Eadie T, Sivasankar MP, Mehta D, Paul D, & Hillman R. Evidence-based clinical voice assessment: A systematic review. American Journal of Speech-Language Pathology, (2013), 22, 212–226. 10.1044/1058-0360(2012/12-0014) [DOI] [PubMed] [Google Scholar]
- 7.American Speech-Language-Hearing Association. Preferred Practice Patterns for the Profession of Speech-Language Pathology [Preferred Practice Patterns]. (2004). Available from www.asha.org/policy.
- 8.Grillo EU An online telepractice model for the prevention of voice disorders in vocally healthy student teachers evaluated by a smartphone application. Perspectives of the ASHA Special Interest Groups, (2017), 2(3), 63–78. doi: 10.1044/persp2.SIG3.63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Grillo EU Building a successful voice telepractice program. Perspectives of the ASHA Special Interest Groups, (SIG 3), (2019), 4(1), 100–110. 10.1044/2018_PERS-SIG3-2018-0014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.VoiceEvalU8. VoiceEvalU8 (version 1.2) [Mobile application, HIPAA-compliant server, and web portal]. Retrieved from https://voiceevalu8store.com/home/. (Accessed January 15, 2020) (2019).
- 11.American Speech-Language-Hearing Association. Scope of Practice in Speech-Language Pathology [Scope of Practice]. (2016) Available from www.asha.org/policy.
- 12.World Health Organization. ICF: International Classification of Functioning, Disability, and Health. Geneva, Switzerland. (2001) [Google Scholar]
- 13.Boersma P, & Weenink D. Praat: Doing phonetics by computer [Computer program]. Versions 6.1.05, 6.0.46, 6.0.32, 6.0.19. Retrieved from http://www.praat.org. (2019). [Google Scholar]
- 14.Kempster GB, Gerratt BR, Verdolini Abbott K, Barkmeier-Kraemer J, & Hillman RE Consensus auditory-perceptual evaluation of voice: Development of a standardized clinical protocol. American Journal of Speech-Language Pathology, (2009), 18, 124–132. 10.1044/1058-0360(2008/08-0017). [DOI] [PubMed] [Google Scholar]
- 15.Watts CR, Awan SN, & Maryn Y. A comparison of cepstral peak prominence measures from two acoustic analysis programs. Journal of Voice, (2017), 31(3), 387–397. [DOI] [PubMed] [Google Scholar]
- 16.Sauder C, Bretl M, & Eadie T. Predicting voice disorder status from smoothed measures of cepstral peak prominence using Praat and Analysis of Dysphonia in Speech and Voice (ADSV). Journal of Voice, (2017), 31(5), 557–566. 10.1016/j.jvoice.2017.01.006. [DOI] [PubMed] [Google Scholar]
- 17.Maryn Y, Roy N, De Bodt M, et al. Acoustic measurement of overall voice quality: A meta analysis. J Acoust Soc Am, (2009); 126, 2619–2634. [DOI] [PubMed] [Google Scholar]
- 18.Lowell S, Colton R, Kelley R. et al. , Predictive value and discriminant capacity of cepstral- and spectral-based measures during continuous speech. Journal of Voice (2013), 27, 393–400. [DOI] [PubMed] [Google Scholar]
- 19.Patel RR, Awan SN, Barkmeier-Kraemer J, et al. Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association Expert Panel to develop a protocol for instrumental assessment of vocal function. American Journal of Speech-Language Pathology, (2018), 27(3), 887–905. [DOI] [PubMed] [Google Scholar]
- 20.Maryn Y, & Weenink D. Objective Dysphonia Measures in the Program Praat: Smoothed Cepstral Peak Prominence and Acoustic Voice Quality Index. Journal of Voice, (2015), 29 (1), 35–43. 10.1016/j.jvoice.2014.06.015 [DOI] [PubMed] [Google Scholar]
- 21.Deliyski DD, Shaw HS, Evans MK, & Vesselinov R. Regression tree approach to studying factors influencing acoustic voice analysis. Folia Phoniatr Logop, (2006), 58:274–288. [DOI] [PubMed] [Google Scholar]
- 22.Vasilakis M, & Stylianou Y. Spectral jitter modeling and estimation. Biomed Signal Process Control, (2009), 4, 183–193. [Google Scholar]
- 23.Amir O, Wolf M, & Amir N. A clinical comparison between two acoustic analysis softwares: MDVP and Praat. Biomed Signal Process Control, (2009), 4, 202–205. [Google Scholar]
- 24.Lovato A, De Colle W, Giacomelli L. et al. Multi-Dimensional Voice Program (MDVP) vs Praat for assessing euphonic subjects: A preliminary study on the gender- discriminating power of acoustic analysis software. Journal of Voice, (2016), 30(6), 765–770. [DOI] [PubMed] [Google Scholar]
- 25.Oguz H, Kilic MA, & Safak MA Comparison of results in two acoustic analysis programs: Praat and MDVP. Turk J Med Sci, (2011), 41, 835–841. [Google Scholar]
- 26.Koo TK, & Li MY A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J chiropractic med, (2016), 15(2), 155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Brockmann M, Drinnan MJ, Claudio Storck C, & Carding PN Reliable jitter and shimmer measurements in voice clinics: The relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. Journal of Voice, (2011), 25(1), 44–53. [DOI] [PubMed] [Google Scholar]
- 28.Brockmann-Bauser M, Van Stan J, Sampaio M, Bohlender JE, Hillman RE, & Mehta DD Effects of vocal intensity and fundamental frequency on cepstral peak prominence in patients with voice disorders and vocally healthy controls. Journal of Voice, (2019),16, doi: 10.1016/j.jvoice.2019.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Phadke KV, Laukkanen AM, Ilomäki I, Kankare E, Geneid A, & Švec JG Cepstral and perceptual investigations in female teachers with functionally healthy voice. Journal of Voice, (2020), 34(3):485.e33–485.e43. doi: 10.1016/j.jvoice.2018.09.010. [DOI] [PubMed] [Google Scholar]
- 30.Barsties V Latoszek B., Ulozaitė-Stanienė N, Maryn Y, Petrauskas T, & Uloza V. The influence of gender and age on the Acoustic Voice Quality Index and Dysphonia Severity Index: A normative study. Journal of Voice, (2019), 33(3):340–345.doi: 10.1016/j.jvoice.2017.11.011. [DOI] [PubMed] [Google Scholar]
- 31.Grillo EU A nonrandomized trial for student teachers of an in-person and telepractice Global Voice Prevention and Therapy Model with Estill Voice Training assessed by the VoiceEvalU8 app. American Journal of Speech-Language Pathology, (accepted). [DOI] [PMC free article] [PubMed] [Google Scholar]