Abstract
Purpose:
This study aimed to investigate the accuracy and reliability of subjective intelligibility estimates of young children's speech made by speech-language pathologists (SLPs) compared to naive listeners, and to examine how the severity of the child's speech impairment influences SLPs' intelligibility estimates.
Method:
Eighteen certified SLPs and 18 naive listeners provided intelligibility ratings of single-word speech samples produced by six preschoolers with speech disorders. All listeners rated intelligibility using two different methods: orthographic transcription and subjective estimation of the percentage of words understood. Absolute differences between estimated and transcription intelligibility scores were used to examine accuracy of intelligibility estimates in both listener groups, and intraclass correlations were used to evaluate interrater reliability.
Results:
Subjective intelligibility estimates differed from orthographic transcription-based intelligibility scores by 12.4% in the SLP listener group and 18.9% in the naive listener group. Interrater reliability of estimated intelligibility was substantially lower than transcription intelligibility in both listener groups.
Conclusion:
Results of this preliminary study suggest that subjective intelligibility estimates by SLPs are not adequately accurate or reliable for measurement of children's speech intelligibility.
Speech intelligibility has been defined as “the degree to which the (speaker's) acoustic signal is understood by a listener” (Yorkston et al., 1996, p. 55). It is fundamental to a speaker's communicative success. Intelligibility is perhaps the most important metric of a child's spoken communicative competence, given that the primary goal of speaking is to be understood. Reduced speech intelligibility can have wide-ranging negative effects on children's communicative participation and quality of life. Poor intelligibility is associated with reduction in young children's initiation of verbal interactions (Pennington & McConachie, 2001), restriction in participation in daily activities, and ultimately can result in feelings of frustration and diminished social connectedness (Allison et al., 2024; Connaghan et al., 2022; Mei et al., 2014; Most et al., 2012). For children with motor speech disorders (MSDs), reduced intelligibility can be a lifelong challenge, and the social impacts can become more evident as children reach adolescence (Connaghan et al., 2022); however, reduced intelligibility is a primary consequence of childhood speech disorders regardless of the underlying etiology. Speech intelligibility was included as a functional communication measure in the National Outcome Measurement System (NOMS; American Speech-Language-Hearing Association, 2023), a data collection registry designed to understand functional communication outcomes in school and health care settings. Data acquired using the NOMS showed that 75% of 3- to 5-year-old children receiving speech-language services in schools were receiving treatment for reduced speech intelligibility (Mullen & Schooling, 2010). Because intelligibility is an important functional outcome measure and a common treatment goal for children with speech disorders, accurate and reliable methods for measuring intelligibility are critical for clinical assessment.
Despite the recognized importance of intelligibility as a clinical outcome measure, a wide gap exists between how it is measured in research and in clinical settings. In research, orthographic transcription of words or sentences by naive listeners is the “gold standard” and provides an objective measure of intelligibility by quantifying the percentage of words understood by a listener; however, in clinical settings, this approach is not often used due to barriers such as limited time, training, and resources (Ertmer, 2011), and workplace policies that prioritize assessment of other skills (King et al., 2012). Instead, speech-language pathologists (SLPs) tend to estimate the proportion of a child's speech they think they understand to document intelligibility (Gordon-Brannan & Hodson, 2000; Skahan et al., 2007). The current study was motivated by the need to better understand the accuracy and reliability of intelligibility estimates by SLPs, considering it remains a commonly used assessment method in clinical practice.
Measurement of Speech Intelligibility Using Orthographic Transcription
Orthographic transcription tasks for measuring speech intelligibility have been described in the literature for several decades (Allison, 2020; Hustad et al., 2015; Miller, 2013; Stipancic et al., 2016; Yorkston & Beukelman, 1978). In this approach, naive listeners are asked to listen to an audio-recorded speech sample (i.e., a set of known words or sentences produced by a speaker) and write down what they think the speaker said. A researcher then compares each listener's transcriptions to the target words produced by the speaker and calculates the percentage of words correctly identified by the listener, which serves as the intelligibility score. As such, this method yields an objective (i.e., data-based) metric of intelligibility. Because intelligibility is a result of the dyadic interaction between a speaker and a listener, there are many listener factors (e.g., familiarity with the speaker and the speech sample) and linguistic factors (e.g., predictability of words) that can influence intelligibility above and beyond the clarity of the speaker's acoustic signal (Flipsen, 1995; Hustad & Cahill, 2003; Mahr & Hustad, 2023). Thus, for any speaker there, is no one “true” intelligibility score; intelligibility will vary across listeners and contexts. In both the pediatric and adult motor speech literature, intelligibility has typically been measured based on orthographic transcription by naive listeners because this approach eliminates potential confounding effects of listener familiarity or training on intelligibility. Thus, orthographic transcription provides a quantitative metric of intelligibility against which other measurement methods can be compared.
In children, intelligibility measured through orthographic transcription has been shown to be useful in identifying the presence of a MSD (Hustad et al., 2015, 2019, 2021), indexing severity of a speech disorder (Gordon-Brannan & Hodson, 2000; Lee et al., 2014), establishing growth curves for understanding speech development in children with and without speech disorders (Hustad et al., 2020, 2021; Mahr et al., 2017), qualifying a child for speech intervention (Gordon-Brannan & Hodson, 2000), and determining whether progress has been made in response to intervention (Pennington et al., 2016). Developmental norms have been recently published for intelligibility of children ages 2–9 years based on orthographic transcription by naive listeners (Hustad et al., 2021). These norms have the potential to serve as a key resource for clinicians; however, their clinical use requires SLPs to measure the intelligibility of children on their caseloads using analogous procedures that allow direct comparison to normative values.
Although orthographic transcription has been shown to be a valid and reliable method for determining children's intelligibility, collecting and scoring results can be resource heavy and time-consuming. SLPs have been surveyed about their opinions regarding speech intelligibility and their approach to the assessment of this skill in children and adults, and the majority agree that intelligibility should be examined (Gurevich & Scamihorn, 2017; Skahan et al., 2007). Despite this agreement, results from a survey focusing on the methods that school based SLPs use to determine whether a child with speech sound disorder (SSD) receives intervention highlighted that less than 40% of respondents incorporate intelligibility measures in testing because evaluation of this skill is not mandatory to qualify a child for receipt of school-based speech therapy (Farquharson & Tambyraja, 2019). Instead, informal clinical estimates are often included in speech evaluations if intelligibility is discussed at all (King et al., 2012; Mcleod & Baker, 2014).
Several barriers to implementation of orthographic transcription for measuring intelligibility have been identified, including the lack of availability of standardized tests (Gurevich & Scamihorn, 2017), the time and technology needed to record and edit audio samples (Ertmer, 2011), and challenges with recruiting naive listeners to transcribe speech samples as a basis for intelligibility ratings (Ertmer, 2011). In most clinical settings, naive listeners are not readily available and patient confidentiality prohibits sharing of speech samples outside the clinical team. Other SLP colleagues are more likely to be available to assist with transcription; however, they cannot be considered naive listeners due to their training and expertise. If SLPs could recruit colleagues who are unfamiliar with a given child to transcribe speech samples, this could reduce or remove one primary barrier to implementing orthographic transcription in clinical settings. Because most of the published research on intelligibility assessment in children has been obtained using naive listeners as judges, clinical application of research findings requires SLPs to measure intelligibility using methods comparable to those used in research studies. Therefore, understanding how intelligibility ratings obtained by SLPs compare to ratings of naive listeners could provide important information about the potential utility of using SLPs as listeners in clinical setting.
Estimation and Scaled Ratings of Speech Intelligibility
A variety of scaled rating procedures have also been used for measuring intelligibility in prior research (Kent et al., 1994). These methods involve listeners making internal judgments regarding how much of a speaker's speech they think they understood. Thus, these methods are inherently subjective but have the benefit of being quick to implement. Like orthographic transcription, listeners are asked to make these scaled ratings after listening to recorded speech samples. Scaled rating methods include equal-appearing interval scales (Hashemi Hosseinabad et al., 2022; Hustad et al., 2012), continuous scales such as visual analog scales (VASs; Abur et al., 2019; Stipancic et al., 2016; Tjaden et al., 2014), as well as methods that require comparison to a reference sample, such as direct magnitude estimation (DME; Walshe et al., 2008; Weismer & Laures, 2002). VAS and DME ratings have been shown to correlate with orthographic transcription scores of adult speakers (Hirsch et al., 2022; Stipancic et al., 2016; Weismer & Laures, 2002); however, they have also been primarily used for research purposes and are not commonly used in clinical settings. Equal-appearing interval scales for broadly rating severity of intelligibility impairment have been integrated into some clinical tests (e.g., the Frenchay Dysarthria Assessment, Enderby, 1983) and a validated screening measure for SSDs (i.e., the Intelligibility in Context Scale; Mcleod et al., 2012). Although these scales are easy for SLPs to implement in clinical settings, there are issues with scale properties that call their validity into question. Intelligibility has been shown to be perceived as prothetic, meaning listeners' sensitivity to differences in intelligibility is additive rather than constant across the severity spectrum (Schiavetti et al., 1981). Therefore, scales that rely on equal intervals or linear rating of intelligibility may yield ratings that are not comparable across raters and timepoints. Studies of clinical practices surrounding intelligibility measurement have reported that SLPs most commonly either informally estimate intelligibility using a percentage based on clinical impression or use descriptive categories (e.g., severely unintelligible) to describe a client's intelligibility (King et al., 2012; Skahan et al., 2007).
Previous studies have yielded mixed findings regarding the relationship between subjective ratings of speech intelligibility and orthographic transcriptions by naive listeners. Some studies have shown that unfamiliar, untrained adults underestimate the proportion of dysarthric speech they understand when their own percent estimates or VAS ratings were later compared to results from their orthographic transcriptions (Hustad, 2006; O'Leary et al., 2021; Stipancic et al., 2016). In contrast, Carter et al. (1996) found that untrained, unfamiliar listeners overestimated intelligibility of adult speakers with moderate-to-severe dysarthria when compared to their orthographic transcriptions. Familiar untrained listeners have also been shown to overestimate intelligibility, compared to their transcriptions. When caregivers were asked to use glossing to determine intelligibility of their children with phonological disorders, subjective estimates of intelligibility exceeded the percentage of words understood (Kwiatkowski & Shriberg, 1992). Yorkston and Beukelman (1978) found no consistent pattern of overestimation or underestimation by untrained listeners, but showed greater dispersion among listeners' intelligibility estimates than transcription scores. Differences in methodology likely account for the variation in findings across these studies, but collectively this body of work suggests that intelligibility estimates by untrained listeners do not accurately or reliably reflect transcription-based intelligibility scores.
The variability noted in untrained listeners' subjective estimates of speech intelligibility may result from the wide range of characteristics listeners attend to when attempting to process a distorted speech signal. Different listeners' internal “yardsticks” may weight aspects of a speech signal differently as each listener prioritizes or discounts specific characteristics of a speaker's speech (Miller, 2013), thus reducing both intrajudge and interjudge reliability of intelligibility estimates compared to scores obtained from orthographic transcription (Yorkston & Beukelman, 1978). The severity of a speaker's speech impairment has also been shown to influence the variability, and therefore reliability, of listener estimates. For speakers who are highly intelligible or conversely, highly unintelligible, the variability in listener intelligibility estimates is lowest (Hustad, 2006; Samar & Metz, 1988; Yorkston & Beukelman, 1978). Other factors may also contribute to variability in listener ratings, including aspects of the rating task and individual differences in perceptual abilities (McHenry, 2011). Importantly, these prior studies have only examined variability and reliability of listeners' intelligibility estimates of adults with dysarthria and have not been replicated with child speakers. The speaker's age may also influence listeners' subjective estimates, since young children are not expected to be fully intelligible and their speech may be more likely to be perceived as age-appropriate, even if all words were not understood.
Listener expertise may also influence intelligibility estimates; however, few studies have directly examined the accuracy and reliability of intelligibility estimates made by SLPs. Understanding this relationship is important, given reports that SLPs often rely on their own estimates to index speech intelligibility (Gordon-Brannan & Hodson, 2000; King et al., 2012; Skahan et al., 2007). Several studies have examined whether expert listeners understand more disordered speech than untrained listeners and have yielded mixed results. While some studies showed that SLPs obtain higher intelligibility scores than untrained listeners on the same speaker (Borrie et al., 2017, 2021; Lundeborg & McAllister, 2007; O'Leary et al., 2021), others found no significant difference between SLPs and untrained listeners (Contardo et al., 2014; Hashemi Hosseinabad et al., 2022; Hirsch et al., 2022; Walshe et al., 2008). Direct comparison of the results obtained from these studies is difficult, given the varied methodologies used; however, the inconsistency in results highlights the complexity of factors affecting listeners' intelligibility ratings. The high variability noted in prior investigations of naive and familiar listeners' subjective estimation of intelligibility has not been examined using SLPs as listeners. Perhaps clinicians who are trained to listen, analyze, and understand disordered speech may approach intelligibility estimation in a more uniform way that results in closer alignment with transcription intelligibility scores and better interrater reliability, compared to untrained listeners.
Hirsch et al. (2022) examined the validity and reliability of SLPs' intelligibility estimates of adults with dysarthria compared to “ground truth” orthographic transcriptions by naive listeners. Results showed that, on average, SLPs' percent intelligibility estimates of adults with dysarthria were a strong predictor of orthographic transcription by naive listeners and that SLPs showed strong intrarater reliability when estimating intelligibility. These findings lend support to the validity and internal consistency of intelligibility estimates made by SLPs; however, the authors note the substantially lower interrater reliability of SLPs' intelligibility estimates at an individual level. Hirsch et al. did not examine the accuracy of SLPs' intelligibility estimates compared to their orthographic transcriptions. In clinical settings, SLPs use intelligibility scores to justify qualifying children for services, make decisions regarding a treatment approach, index severity of the child's speech impairment, and track change in speech over time. For SLPs to be able to rely on subjective estimates of a child's intelligibility for these purposes, their intelligibility estimates need to accurately reflect the percentage of the child's words they understood. Because multiple SLPs may assess and treat the same child over time, it is also critical that intelligibility estimates be reliable across SLPs.
Current Study
The primary purpose of this study was to evaluate the accuracy and reliability of SLPs' intelligibility estimates compared to intelligibility measured through orthographic transcription. A secondary purpose of this study was to compare SLPs' intelligibility transcription-based and estimated intelligibility ratings to ratings by naive listeners. This comparison is important for assessing whether intelligibility ratings by SLPs can be validly compared to data from naive listeners reported in the pediatric speech literature. To address this goal, we directly compared subjective estimates of children's speech intelligibility to results of objective transcription-based intelligibility measurements in two groups of listeners: certified SLPs and untrained naive listeners. For the purposes of this study, accuracy of intelligibility estimates was defined by how closely each listener's intelligibility estimate of a child speaker matched the percentage of the child's words they correctly identified through orthographic transcription. Our research questions were as follows:
Does the accuracy of listeners' intelligibility estimates differ between SLP and naive listener groups?
Do SLPs or naive listeners systematically overestimate or underestimate children's intelligibility, compared to transcription-based intelligibility scores?
Across the same group of child speakers, do SLPs' transcription-based intelligibility scores or estimated intelligibility scores differ from those of naive listeners?
How does interrater reliability of estimation-based and transcription-based measures of children's speech intelligibility compare between SLP and naive listener groups?
We hypothesized that SLPs would be more accurate in estimating children's speech intelligibility than naive listeners due to their expertise, but that transcription-based measures of speech intelligibility would be more reliable than estimated intelligibility in both groups of listeners.
Method
Participants
Two groups of listeners participated in this study: naive listeners and SLPs. The experimental protocol was the same for both SLPs and naive listeners. Procedures were approved by the Northeastern University Institutional Review Board (No. 18-05-18). All listeners were native speakers of American English and had no history of speech, language, or neurological impairment. In addition, all listeners were required to pass a hearing screening at 25 dB for 500, 1000, 2000, 4000, and 8000 Hz to be eligible to participate.
Naive Listeners
Eighteen naive listeners with no notable experience working or communicating with children who have speech disorders participated in this study. They were recruited through flyers and advertisements around Northeastern University's campus. The naive listeners had an average age of 25 years (SD = 4 years) and included 10 female and eight male participants. Fifteen naive listeners reported their race as White, two reported their race as Asian, and one reported their race as “other.” One listener reported their ethnicity as Hispanic/Latino. All naive listeners completed a background survey that included questions for confirming inclusion criteria and obtaining general demographic information. On this background survey, listeners indicated if they had exposure to individual(s) with communication disorders in clinical settings, research, through jobs or volunteer work, or as friends/family members. If they indicated “yes” to any of these items, they were asked to describe their experience. Anyone who reported regular exposure to a person or people with communication disorders was excluded.
SLPs
Eighteen certified SLPs with at least 1 year of clinical experience working with children who have speech disorders were recruited as expert listeners for this study. The included SLPs had an average age of 40 years (SD = 12 years) and were all female participants. All SLP listeners reported their race as White; one reported her ethnicity as Hispanic/Latino. SLP listeners completed the same background survey as the naive listeners, but also completed additional questions regarding their clinical experience. The included SLPs had an average of 14 years of clinical experience (SD = 12 years, range: 2–36 years). SLPs self-rated their experience with diagnosing and treating pediatric MSDs and SSDs on a 4-point scale (0 = no experience, 1 = little experience, 2 = moderately experienced, 3 = highly experienced). The majority of SLPs reported little to moderate experience diagnosing and treating MSDs, and moderate experience diagnosing and treating SSDs. SLPs' self-ratings of their experience levels are shown in Table 1.
Table 1.
Self-rated experience of speech-language pathologist (SLP) listeners in diagnosing and treating motor speech disorders (MSDs) and speech sound disorders (SSDs) in children.
| Experience level | Diagnosing MSDs | Treating MSDs | Diagnosing SSDs | Treating SSDs |
|---|---|---|---|---|
| No experience | 0% | 6% | 6% | 0% |
| Little experience | 50% | 44% | 17% | 17% |
| Moderately experienced | 44% | 44% | 50% | 61% |
| Highly experienced | 6% | 6% | 28% | 22% |
Note. Data indicate the percentage of SLP listeners reporting each level of experience.
Child Speech Samples
Six child speakers with speech disorders between 3 and 6 years of age were selected from a larger ongoing study to provide speech samples for the current study. Children were selected to reflect a range of speech severity levels, based on results of standardized articulation testing and observation of speech characteristics, and varied speech diagnoses. Three child speakers had MSDs (two with dysarthria and one with childhood apraxia of speech), and three were diagnosed with non–motor-based SSDs. The child speakers were each recorded producing the same set of 78 single words using the Test of Children's Speech + (TOCS+) software (Hodge & Daniels, 2007). The TOCS+ is a validated measure of speech intelligibility in children (Hodge & Gotzke, 2014). Speech samples were recorded with a professional microphone in a sound-attenuated lab space for five of the six child speakers. One child's (MSD3) data collection was conducted remotely via Zoom, using settings recommended by Zoom for optimized audio recording: (a) selecting “Enable Original Sound” in in-meeting options, (b) selecting “High Fidelity Music Mode,” (c) unchecking “Echo Cancellation,” and (d) unchecking “Stereo Audio.” The included children had an average age of 4;8 (years;months; range: 3;4–6;10). There were five male children and one female child. Based on parent report, all child speakers were White and not Hispanic/Latino. Demographic and speech characteristics of the child speakers are shown in Table 2.
Table 2.
Child speaker demographics and speech profiles.
| Child ID | Age (years;months) | Gender | Speech diagnosis | Arizona SS (word articulation) | Medical diagnosis | Speech characteristics |
|---|---|---|---|---|---|---|
| SSD1 | 3;11 | M | SSD | 87 (borderline) | n/a | Multiple phonological error patterns (i.e., cluster reduction, stopping, weak syllable deletion) and lateral distortion of alveolar and palatal fricatives and affricates |
| SSD2 | 3;4 | M | SSD | 74 (moderate) | n/a | Atypical phonological error patterns (i.e., backing, substituting voiceless stops with fricatives, insertions), vowel errors |
| SSD3 | 3;7 | M | SSD | 80 (mild) | n/a | Multiple phonological error patterns (i.e., final consonant deletion, postvocalic devoicing, fronting, stopping, cluster reduction), vowel errors |
| MSD1 | 4;7 | M | Dysarthria | 81 (mild) | Mixed cerebral palsy | Imprecise quality, indistinct syllables in multisyllabic words, slow rate, centralized vowels, better precision in single words than connected speech, intermittent hypernasality, inconsistent voicing errors |
| MSD2 | 6;5 | F | CAS | < 50 (severe) | Temple syndrome | Slow rate, increased difficulty with multisyllabic words, inconsistent productions of words, difficulty smoothly transitioning between syllables, atypical prosody (lexical stress errors and syllable segregation) |
| MSD3 | 6;10 | M | Dysarthria | Not assessed | Athetoid cerebral palsy | Short breath groups, strained and breathy vocal quality, limited pitch range, severe global articulatory imprecision, voicing errors, vowel errors |
Note. The medical diagnosis column is only applicable for the children with MSDs. Speech characteristics describe speech sound errors for children in the SSD group and motor speech characteristics present for children in the MSD group. SS = standard score; SSD = speech sound disorder; M = male; n/a = not applicable; MSD = motor speech disorder; F = female; CAS = childhood apraxia of speech.
Children's speech diagnoses were based on parent-reported medical history information and confirmed by researchers through comprehensive speech and language testing completed during the child's data collection session. In addition to the TOCS+, each child participated in formal word articulation testing using the Arizona Articulation and Phonology Scale–Fourth Revision (Arizona-4; Fudala, 2000), formal receptive language testing using the Test for Auditory Comprehension of Language–Fourth Edition (Carrow-Woolfolk, 2014), and a set of informal speech tasks designed to observe each child's speech characteristics across tasks of varying length and complexity. These informal tasks included a spontaneous speech sample, production of familiar sequences (i.e., counting to 20, reciting the alphabet), five repetitions of the sentence, “Buy Bobby a Puppy,” and a diadochokinesis task in which the child repeated syllables “ba” and “pa” as quickly and clearly as possible on one breath. One child (MSD3) had severe global motor impairment related to cerebral palsy and did not complete the Arizona-4 due to fatigue after completing the other speech tasks. Auditory-perceptual motor speech characteristics were rated across tasks by an experienced SLP (the first author) and used to confirm or rule out motor speech diagnoses. Children with SSDs all exhibited consistent speech sound error patterns and did not show speech features consistent with motor speech impairment.
Procedures: Intelligibility Rating Tasks
Three SLP listeners and three naive listeners were assigned to judge the intelligibility of each child speaker. Each listener heard only one child speaker. All listeners completed the same two intelligibility rating tasks for their assigned child speaker: an intelligibility estimation task and an orthographic transcription task. The estimation task was completed first by all listeners, and the transcription task was completed second. Between these tasks, listeners were asked to complete the background survey. The survey was administered between the two intelligibility rating tasks to minimize potential effects of exposure to words in the estimation task on listener's word transcriptions in the second task. The background survey took approximately 5 min for listeners to complete.
Child speech samples were recorded in individual files using the TOCS+ software. Speech samples were cleaned to remove background noise or disfluent productions, and peak amplitude normalized using the Normalize effect in Audacity (Audacity Team, 2014) before being presented to listeners. All children had fluent productions of all 78 words.
Data collection sessions were completed either in a sound-attenuated booth at Northeastern University or in a quiet room at an off-site location (e.g., speech-language clinic). Off-campus data collection was offered to facilitate recruitment of SLP listeners. Five SLP listeners and 16 naive listeners completed the experiment at Northeastern. Thirteen SLP listeners and two naive listeners completed the experiment at an off-campus location. All listeners completed the experiment on the same laptop with the computer volume set to a standard level (volume = 75) by researchers and wore the same headphones. The listening experiment took between 20 and 30 min for each listener to complete and was completed in one session.
Intelligibility Estimation Task
The first intelligibility rating task involved listeners making an informal estimate of the child's intelligibility. Prior to beginning the experiment, listeners were shown written instructions that said, “You are going to hear a child say a series of words. After you hear all the words, please estimate the percent of the child's words you think you understood. Enter your estimate in the box below.” Listeners heard an audio recording of their assigned child speaker saying the complete list of 78 TOCS+ single words, one after the other. The full set of single-word audio files was loaded into VLC audio player (VideoLan, 2006), and the words were presented successively in a different randomized order for each listener until each word audio file had been played one time. The playlist automatically advanced to the next audio file after a word was played, so each listener heard all 78 words in succession in a different randomized order. After hearing the whole word list, listeners were asked to estimate the percentage of words they thought they understood and instructed to enter their estimate in an online survey. No paper or writing utensils were given or allowed for assistance with this estimation task. This task was designed to mimic what SLPs might do in a clinical setting when estimating intelligibility.
Orthographic Transcription Task
The second intelligibility rating task involved listeners orthographically transcribing the words they heard as a basis for intelligibility measurement. Each listener heard their assigned child speaker produce the TOCS+ word list again, but, this time, the words were presented one at a time, in a different randomized order, using the TOCS+ software. After each word was played, listeners were asked to type the word they thought the child said. Listeners were allowed as much time as they needed to type a word but could only hear each word one time. After each session was complete, a research assistant reviewed the listener's transcriptions to ensure that spelling errors and homonyms were counted as the correct identification of the target word. Transcription intelligibility was calculated by the TOCS+ software for each listener as the percentage of the child's words they correctly transcribed.
Statistical Analysis
All statistical analyses were conducted using R-4.3.3 (R Core Team, 2024). An alpha level of .05 was used for all statistical comparisons. Analyses for each research question were as follows:
RQ1: Does the accuracy of listeners' intelligibility estimates differ between SLP and naive listener groups? To examine the accuracy of listener estimates, we compared each listener's intelligibility estimate to the percentage of words they correctly identified on the orthographic transcription task for the same speaker. First, we used Pearson's correlations to examine the strength of the association between intelligibility estimates and transcription-based intelligibility scores for both the SLP and the naive listener groups. Next, for each listener, we calculated difference scores by subtracting the listener's intelligibility estimate from their transcription intelligibility score. The absolute value of these difference scores represents the number of percentage points that each listener's intelligibility estimate deviated from the percentage of words they correctly transcribed. Thus, these absolute difference scores reflect the accuracy of listeners' intelligibility estimates. To examine differences in intelligibility estimation accuracy between listener groups, an independent t test was used to compare difference scores between SLP and naive listeners.
RQ2: Do SLPs or naive listeners systematically overestimate or underestimate children's intelligibility, compared to transcription-based intelligibility scores? Within each listener group, paired-samples t tests were used to compare estimated intelligibility and transcription intelligibility scores to examine whether SLPs and naive listeners were systematically underestimating or overestimating intelligibility across speakers.
RQ3: Across the same group of child speakers, do SLPs' transcription-based intelligibility scores or estimated intelligibility scores differ from those of naive listeners? To compare transcription intelligibility scores and estimated intelligibility scores between SLPs and naive listener groups, we used Mann–Whitney U tests. We used a nonparametric test for these analyses due to the small sample size and because a Shapiro–Wilk normality test showed that transcription intelligibility scores were not normally distributed. We also calculated absolute differences between mean transcription and estimated intelligibility scores from SLPs and naive listeners for each child speaker to examine how closely SLP and naive listeners transcription intelligibility scores aligned.
RQ4: How does interrater reliability of estimation-based and transcription-based measures of children's speech intelligibility compare between SLP and naive listener groups? To examine interrater reliability of intelligibility measurements, we calculated intraclass correlations (ICCs) using a two-way random-effects model, estimating agreement between raters, with “single” as the unit of analysis using the irr package (Gamer & Lemon, 2019). ICCs were calculated separately for both listener groups (i.e., SLPs and naive listeners) and for both intelligibility measurement approaches (estimated intelligibility and transcription-based intelligibility). ICC values can be interpreted as follows: Below .5 indicates poor interrater reliability, between .5 and .75 indicates moderate reliability, between .75 and .9 indicates good reliability, and above .90 indicates excellent reliability (Koo & Li, 2016).
Results
Accuracy of SLP and Naive Listeners' Intelligibility Estimates
Associations between estimated and transcription intelligibility scores for both listener groups are shown in Figure 1. Results of Pearson's correlations showed a significant positive association between SLP's estimated intelligibility and transcription intelligibility scores across child speakers (r = .72, p < .001). Similarly, results also showed a significant positive association between naive listeners' estimated intelligibility and transcription intelligibility scores (r = .62, p = .006). The correlation between estimated and transcription intelligibility in the SLP group was higher than in the naive listener group, but both were in the moderate range.
Figure 1.
Correlations between estimated intelligibility and transcription intelligibility in SLP listeners (left) and NLs (right). Shaded regions indicate 95% confidence intervals. Two NLs for MSD1, and two NLs for MSD3 had identical scores for both estimated and transcription intelligibility; therefore, two data points are overlapping on the NL figure. SLP = speech-language pathologist; NL = naive listener.
For SLP listeners, the mean difference between estimated and transcription intelligibility was 12.4% (SD = 9.3%). For naive listeners, the mean difference between estimated and transcription intelligibility was 18.9% (SD = 13.1%). Shapiro–Wilk normality tests showed that difference scores within both listener groups were normally distributed, and a comparison of variances showed no significant difference in variances between the two listener groups; therefore, an independent-samples t test was used to compare difference scores between the SLP and naive listener groups. Results showed that difference scores did not significantly differ between listener groups, t(17) = 1.71, p = .10, Cohen's d = 0.57, suggesting that SLPs and naive listeners did not significantly differ in the accuracy of their intelligibility estimation. The distribution of absolute difference scores by listener group is shown in Figure 2.
Figure 2.
Distribution of absolute difference scores between estimated and transcription intelligibility in the naive listener (NL) and SLP listener groups. SLP = speech-language pathologist.
Within-Group Comparisons of Estimated and Transcription Intelligibility Scores
Mean estimated and transcription intelligibility scores by SLP and naive listener groups for each child speaker are shown in Table 3. Across all child speakers, results of a paired-samples t test showed that SLPs' intelligibility estimates did not significantly differ from their transcription-based intelligibility scores, t(17) = 1.34, p = .20, Cohen's d = 0.24, indicating that SLPs were not systematically overestimating or underestimating the percentage of words they understood. For the naive listener group, intelligibility estimates were significantly higher than transcription-based intelligibility scores across child speakers, t(17) = 3.39, p = .003, Cohen's d = 0.7, indicating that naive listeners were overestimating the percentage of words they understood.
Table 3.
Mean estimated and transcription intelligibility scores by SLP and naive listener groups for each child speaker.
| Child ID | Speech diagnosis | SLP listeners |
Naive listeners |
Difference between SLP and NL transcription intelligibility | Difference between SLP and NL intelligibility estimates | ||
|---|---|---|---|---|---|---|---|
| Estimated intelligibility M (SD) |
Transcription intelligibility M (SD) |
Estimated intelligibility M (SD) |
Transcription Intelligibility M (SD) |
||||
| SSD1 | SSD | 52% (24%) | 47% (6%) | 43% (25%) | 43% (6%) | 4% | 9% |
| SSD2 | SSD | 23% (14%) | 11% (1%) | 27% (8%) | 9% (3%) | 2% | 4% |
| SSD3 | SSD | 33% (15%) | 36% (2%) | 52% (19%) | 31% (3%) | 5% | 19% |
| MSD1 | Dysarthria | 40% (18%) | 42% (6%) | 57% (6%) | 47% (5%) | 5% | 17% |
| MSD2 | CAS | 18% (8%) | 5% (4%) | 35% (18%) | 3% (4%) | 2% | 17% |
| MSD3 | Dysarthria | 4% (2%) | 1% (1%) | 7% (6%) | 0% (0%) | 1% | 3% |
Note. SLP = speech-language pathologist; NL = naive listener; SSD = speech sound disorder; MSD = motor speech disorder; CAS = childhood apraxia of speech.
Alignment Between SLPs and Naive Listeners for Transcription and Estimated Intelligibility Scores
Results of a Mann–Whitney U test showed no significant difference between the orthographic transcription intelligibility scores obtained by SLPs and those obtained by naive listeners, W = 167, p = .65. The difference between SLPs' mean transcription intelligibility scores and naive listeners' intelligibility scores across the six child speakers was 3% on average (range: 1%–5%; see Table 3).
Results of a Mann–Whitney U test also showed no significant difference between the estimated intelligibility scores obtained by SLPs and those obtained by naive listeners, W = 196, p = .29. The difference between SLPs' mean estimated intelligibility scores and naive listeners estimated intelligibility scores across the six child speakers was 11.5% on average (range: 3%–19%; see Table 3). Transcription and estimated intelligibility scores for both listener groups are shown in Figure 3.
Figure 3.
Distribution of transcription intelligibility scores (left) and estimated intelligibility scores (right) in the naive listener (NL) and SLP listener groups. MSD = motor speech disorder; SSD = speech sound disorder; SLP = speech-language pathologist.
Interrater Reliability of SLP and Naive Listeners' Intelligibility Measurements
In the SLP listener group, results of the ICC analysis showed that interrater reliability was excellent for transcription-based intelligibility measurements (ICC = .97, p < .001) and poor for estimated intelligibility (ICC = .46, p = .05). Similarly, in the naive listener group, results of the ICC analysis showed that interrater reliability was excellent for transcription-based intelligibility measurements (ICC = .97, p < .001) and moderate for estimated intelligibility (ICC = .51, p = .02).
Discussion
The purpose of this study was to investigate the accuracy and interrater reliability of SLP's intelligibility estimates of young children with speech disorders and to examine how expertise influences listeners' intelligibility ratings by comparing SLPs to naive listeners. There were four primary findings: (a) SLPs and naive listeners showed similar levels of accuracy in estimating intelligibility, but both groups showed margins of estimation error above 12%; (b) SLPs' estimated and transcription intelligibility did not differ, but naive listeners' intelligibility estimates were significantly higher than their transcription intelligibility scores; (c) SLPs and naive listeners did not significantly differ in their transcription intelligibility scores; and (d) interrater reliability of estimated intelligibility was substantially lower in both SLP and naive listener groups compared to transcription intelligibility.
Accuracy of Estimated Intelligibility
In this study, we used orthographic transcription as a direct, objective measure of intelligibility and evaluated the accuracy of listeners' intelligibility estimates based on how closely their estimates matched the percentage of words they correctly transcribed. In both SLP and naive listener groups, estimated intelligibility was moderately positively correlated with transcription intelligibility, demonstrating that listener estimates validly reflect general differences in children's intelligibility across the severity spectrum. This finding is consistent with previous research showing significant positive associations between estimated and transcription-based intelligibility in children and adults (Hirsch et al., 2022; Hustad, 2006; Yorkston & Beukelman, 1978), but does not directly address the accuracy of listener estimates.
The absolute difference between estimated intelligibility and the percentage of words transcribed correctly provided a quantitative measure of estimation accuracy for each listener and yielded information regarding the margin of error involved in estimating intelligibility. Absolute difference scores were 12.4% on average for SLP listeners and 18.9% on average for naive listeners. Prior pediatric research (Pennington et al., 2010, 2013) has shown that a 10% change in speech intelligibility, as measured through orthographic transcription, is statistically significant and clinically meaningful. The margins of error associated with intelligibility estimation identified in this study exceed that range. As such, relying on estimation to assess a child's speech intelligibility over time could result in either failing to identify a clinically meaningful change in intelligibility or incorrectly determining that intelligibility has meaningfully changed when it has not.
Absolute difference scores were not significantly different between SLP and naive listener groups, suggesting that SLPs' experience did not improve the accuracy of their intelligibility estimates. This result was somewhat surprising, as we expected SLPs' training to improve their ability to accurately estimate intelligibility. Although SLPs are trained to listen closely for speech sound errors and disordered speech characteristics, they are not typically trained in how to accurately estimate the percentage of words they understood from a child's speech sample. Thus, the SLPs' lack of experience with this particular skill may help explain why they did not perform better than naive listeners on this task. Another possibility is that the single-word task completed by the child speakers in this study did not provide SLPs with enough information to estimate intelligibility accurately. Future research is needed to determine whether SLPs' intelligibility estimates would more closely align with their transcription scores in judgment of a sentence production task.
Comparison of Intelligibility Estimates to Transcription Intelligibility Scores Within SLPs and Naive Listeners
Results did show a difference between SLPs and naive listeners in the direction of their estimation error. The SLP listeners' intelligibility estimates were not significantly lower or higher than their transcription-based intelligibility scores on average across children, suggesting that SLPs were not systematically underestimating or overestimating intelligibility. In contrast, naive listener estimates of intelligibility were significantly higher than their transcription-based intelligibility scores on average across children, indicating a clear pattern of overestimation. This finding is consistent with Carter et al. (1996), who found that naive listeners overestimated speakers' intelligibility when they were asked to estimate the number of words they understood after completing a sentence transcription task. In contrast, our findings differ from several other studies that have documented a pattern of underestimation by naive listeners when compared to transcription scores of sentences produced by adults with dysarthria (Hustad, 2006; O'Leary et al., 2021; Stipancic et al., 2016).
One possible reason for the overestimation by naive listeners in this study is the single-word speech stimuli used. In contrast to the sentence stimuli used in most prior research, single words do not provide listeners with any contextual cues to help with identification of words. The TOCS+ word list was designed to test intelligibility of phonetic contrasts (Hodge & Gotzke, 2011), and thus, the target words have multiple phonological neighbors (e.g., “right,” “white,” and “light”). Naive listeners may have perceived common substitution errors (e.g., a child saying “white” for “right”) as words they correctly understood, thus resulting in overestimation of the child's intelligibility. In addition, preschool-aged children are not expected to be fully intelligible (Hustad et al., 2021) and, thus, naive listeners may have perceived children's speech as sounding age appropriate despite some obvious articulation errors, which may have biased them toward overestimating the percentage of words they understood. In contrast, SLPs may have used their prior knowledge of typical error patterns in children's speech development when listening to the word stimuli, thus making them less likely to overestimate the child's intelligibility.
It is also possible that the estimation task may have been more challenging than the orthographic transcription task because it required the listeners to sustain and selectively attend to a child speaking a list of unrelated single words, prior to estimating their overall speech intelligibility. While SLPs may have been able to rely on their training and experience to parse items from the list and attempt to bank words they understood as the speaker continued, this may have overwhelmed naive listeners with no prior framework to process and retain the information. This, in conjunction with the sensation of being tested and the desire to be successful, may have contributed to the consistent pattern of overestimation observed among naive listeners.
Comparison of Transcription-Based Intelligibility Scores and Estimated Intelligibility Scores Between SLP and Naive Listener Groups
Our results showed that SLPs and naive listeners did not significantly differ in their transcription-based intelligibility scores at a group level, suggesting that, on average, SLPs did not understand more words produced by the children with speech disorders than the naive listeners. SLPs' mean transcription intelligibility scores and naive listeners' mean transcription intelligibility scores differed by 5% or less for all six child speakers. This finding is consistent with prior studies that found strong correspondence between intelligibility scores between SLPs and untrained listeners (Hashemi Hosseinabad et al., 2022; Hirsch et al., 2022; Smith et al., 2019; Walshe et al., 2008), however, contrasts with other studies that have found SLPs understand significantly more speech than naive listeners (e.g., O'Leary et al., 2021). Although preliminary due to our small sample size, these results suggest good alignment between transcription intelligibility scores obtained by SLPs and naive listeners for young children with speech disorders within the context of a single-word intelligibility task. If confirmed in a larger sample, this suggests that SLPs may be able to rely on orthographic transcription of single words by their SLP colleagues to obtain quantitative intelligibility scores that can be interpreted in the context of published norms (Hustad et al., 2021).
Results also showed that estimated intelligibility scores did not differ between SLPs and naive listeners at a group level, suggesting that SLPs and naive listeners were also similar in the amount of words they thought they understood, across the child speakers. SLPs' mean estimated intelligibility scores and naive listeners' mean estimated intelligibility scores aligned more closely for some child speakers than others. Transcription and estimated intelligibility scores were most closely aligned between SLPs and naive listeners for the child with the most severe speech impairment (MSD3). It is possible that child characteristics, such as severity of speech impairment, may influence how closely intelligibility estimates match between SLPs and naive listeners; however, these factors would need to be explored with a larger sample of child speakers.
Interrater Reliability of Intelligibility Measures
Our results also showed that interrater reliability of estimated intelligibility was poor to moderate in both SLP and naive listener groups. In contrast, interrater reliability of transcription-based intelligibility measurements was excellent in both groups. This suggests that transcription methods are far more reliable than subjective estimation, regardless of the listener's expertise. Our results are consistent with prior research showing greater variability in estimated intelligibility than transcription intelligibility among naive listeners (Hustad, 2006; Yorkston & Beukelman, 1978). The poor interrater reliability among SLP listeners' estimates in this study (ICC = .46) is similar to findings of Hirsch et al. (2022), who reported moderate interrater reliability (ICC = .56) among SLPs' intelligibility estimates of adult speakers. These results emphasize that the intelligibility assessment method is more important than listener expertise for obtaining reliable measurement across individual raters.
There are many potential sources of variability that contribute to the reduced reliability of intelligibility estimates across listeners in both groups. Subjective judgments of speech rely on the listener's internal model and expectations for how a speaker should sound, as well as how they perceive and weigh various speech characteristics when making judgments (Miller, 2013). Findings of the current study suggest that these perceptual limitations apply to SLPs to the same degree as naive listeners; thus, supporting the need for using objective intelligibility measurement in clinical settings. Interrater reliability is important for measuring intelligibility in clinical settings for consistency in evaluation and treatment. If one SLP evaluates a client and uses estimation to obtain a baseline intelligibility measure and another SLP treats that client and uses estimation to monitor progress, it is likely their estimates will not reflect the child's actual intelligibility change. This may result in SLPs incorrectly reporting a decline in progress or a huge improvement, directly affecting the subsequent services that client will receive.
Limitations and Future Directions
This study is preliminary, and results should be interpreted in context of its small sample size. Our SLP and naive listener groups were not equivalent in terms of age, which may have contributed to differences in ratings between groups (Dagenais et al., 2011); however, listener expertise was likely a much larger factor in listener judgments. Listeners were tested in two different environments (i.e., sound booth and quiet clinic room), which may have had a small influence on listener ratings; however, identical headphones and computers were used to test listeners across sites, thus minimizing any potential effects of slight differences in the listening environment. All listeners completed the estimation task first and the transcription task second; thus, it is possible that listeners' transcription scores may have been slightly higher than if they had only one exposure to the child's word productions. We were not able to compare intrarater reliability of intelligibility estimates to orthographic transcription due to our study design, but this comparison may yield meaningful information regarding how reliable SLPs are within themselves when rating intelligibility in future studies. SLP listeners varied in their level of experience in working with children with SSDs and MSDs. Future research is needed to more fully understand the role of SLP training and varying levels of experience on intelligibility ratings. The child speakers had intelligibility scores ranging from approximately 0%–50%, so the sample did not represent the full spectrum of intelligibility levels possible in preschool-aged children. Further research is needed to understand accuracy and reliability of SLPs' intelligibility estimates across the full severity range and to assess whether there are differences in accuracy and reliability across different diagnostic groups.
Conclusions and Clinical Implications
Overall, results of this preliminary study suggest that SLPs' informal intelligibility estimates of preschool-aged children are not adequately accurate or reliable for use in clinical practice. SLPs' intelligibility estimates differed from the percentage of words they correctly transcribed by 12.4% on average. This is a large margin of error that could meaningfully impact an SLP's clinical decisions regarding eligibility for services or treatment goals. In addition, interrater reliability of SLPs' intelligibility estimates was poor, which has important implications for children who are seen by multiple clinicians. If a child is being seen for speech services in both school and a private practice setting, for example, the two SLPs may estimate the child's intelligibility differently, leading to inconsistency in documentation and, potentially, treatment goals. Similarly, if a child is evaluated by two different SLPs across time points, reliance on intelligibility estimates could lead to inaccurate conclusions regarding the change in a child's intelligibility over time. While intelligibility estimation has historically been considered a time-saving practice, the low accuracy and reliability of this approach could lead to inaccurate therapeutic goals, dosage, and clinical methods, prolonging the course of a child's treatment. The high interrater reliability seen in scores obtained from transcription-based intelligibility assessment indicates that this type of objective assessment can be confidently used to quantify intelligibility, consistent with prior research.
Our results also showed that SLPs' intelligibility estimates were similar in accuracy and interrater reliability to estimates by naive listeners, suggesting that listener expertise did not improve performance in estimating intelligibility. On the single-word transcription intelligibility task, SLPs also did not significantly differ from naive listeners in the percentage of words they understood. Although this finding would need to be replicated with a larger sample, it suggests the possibility that orthographic transcriptions by SLPs might yield intelligibility scores comparable to those reported in published studies using naive listeners. If confirmed, this could support SLPs in using their colleagues to provide orthographic transcriptions for measuring intelligibility in clinical settings, thus reducing one barrier to clinical implementation of this approach and potentially increasing clinical uptake of published intelligibility data. Given the multiple recognized barriers to implementing transcription-based intelligibility testing in clinical settings, there is a need for more research on ways to increase clinical uptake of objective intelligibility measurement. Identifying the precise barriers and facilitators to use of speech intelligibility testing across clinical environments would help inform the development of more clinically feasible objective intelligibility assessment methods. In addition, technology-driven methods for automation of objective intelligibility measurement are currently being developed (Huang et al., 2021; Jiao et al., 2019) and hold promise for future clinical implementation.
Data Availability Statement
Anonymized data from this project are available on reasonable request from the first author.
Acknowledgments
This research was supported by National Institute on Deafness and Other Communication Disorders Grant R21DC019721 and by an internal Transforming Interdisciplinary Experiential Research 1 seed grant from Northeastern University awarded to Kristen Allison. The authors would like to thank Loukia Aydag and Maggie Camelio for their assistance with data collection for this study.
Funding Statement
This research was supported by National Institute on Deafness and Other Communication Disorders Grant R21DC019721 and by an internal Transforming Interdisciplinary Experiential Research 1 seed grant from Northeastern University awarded to Kristen Allison.
References
- Abur, D., Enos, N. M., & Stepp, C. E. (2019). Visual analog scale ratings and orthographic transcription measures of sentence intelligibility in Parkinson's disease with variable listener exposure. American Journal of Speech-Language Pathology, 28(3), 1222–1232. 10.1044/2019_AJSLP-18-0275 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Allison, K. M. (2020). Measuring speech intelligibility in children with motor speech disorders. Perspectives of the ASHA Special Interest Groups, 5(4), 809–820. 10.1044/2020_persp-19-00110 [DOI] [Google Scholar]
- Allison, K. M., Doherty, K. M., & for the Cerebral Palsy Research Network. (2024). Relation of speech-language profile and communication modality to participation of children with cerebral palsy. American Journal of Speech-Language Pathology, 33(2), 1040–1050. 10.1044/2023_AJSLP-23-00267 [DOI] [PubMed] [Google Scholar]
- American Speech-Language-Hearing Association. (2023). National Outcomes Measurement System (NOMS): Adults in Healthcare–Outpatient National Data Report 2023. https://www.asha.org/NOMS
- Audacity Team. (2014). Audacity(R): Free audio editor and recorder (Version 3.4) [Computer program]. http://audacity.sourceforge.net/
- Borrie, S. A., Lansford, K. L., & Barrett, T. S. (2017). Generalized adaptation to dysarthric speech. Journal of Speech, Language, and Hearing Research, 60(11), 3110–3117. 10.1044/2017_JSLHR-S-17-0127 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borrie, S. A., Lansford, K. L., & Barrett, T. S. (2021). A clinical advantage: Experience informs recognition and adaptation to a novel talker with dysarthria. Journal of Speech, Language, and Hearing Research, 64(5), 1503–1514. 10.1044/2021_JSLHR-20-00663 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carrow-Woolfolk, E. (2014). Test for Auditory Comprehension of Language–Fourth edition. Pro-Ed. [Google Scholar]
- Carter, C., Yorkston, K., Strand, E., & Hammen, V. (1996). Effects of semantic and syntactic context on actual and estimated sentence intelligibility of dysarthric speakers. In D. Robin, K. Yorkston, & D. Beukelman (Eds.), Disorders of motor speech: Assessment, treatment, and clinical characterization (pp. 67–87). Brookes. [Google Scholar]
- Connaghan, K. P., Baylor, C., Romanczyk, M., Rickwood, J., & Bedell, G. (2022). Communication and social interaction experiences of youths with congenital motor speech disorders. American Journal of Speech-Language Pathology, 31(6), 2609–2627. 10.1044/2022_AJSLP-22-00034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Contardo, I., McAllister, A., & Strömbergsson, S. (2014). Real-time registration of listener reactions to unintelligibility in misarticulated child speech. Proceedings from FONETIK 2014, 127–132. [Google Scholar]
- Dagenais, P. A., Adlington, L. M., & Evans, K. J. (2011). Intelligibility, comprehensibility, and acceptability of dysarthric speech by older and younger listeners. Journal of Medical Speech-Language Pathology, 19(4), 37–48. [Google Scholar]
- Enderby, P. M. (1983). Frenchay dysarthria assessment. College-Hill Press. [Google Scholar]
- Ertmer, D. J. (2011). Assessing speech intelligibility in children with hearing loss: Toward revitalizing a valuable clinical tool. Language, Speech and Hearing Services in Schools, 42(1), 52–58. 10.1044/0161-1461(2010/09-0081) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farquharson, K., & Tambyraja, S. R. (2019). Describing how school-based SLPs determine eligibility for children with speech sound disorders. Seminars in Speech and Language, 40(02), 105–112. 10.1055/s-0039-1677761 [DOI] [PubMed] [Google Scholar]
- Flipsen, P., Jr. (1995). Speaker–listener familiarity: Parents as judges of delayed speech intelligibility. Journal of Communication Disorders, 28(1), 3–19. 10.1016/0021-9924(94)00015-R [DOI] [PubMed] [Google Scholar]
- Fudala, J. B. (2000). Arizona Articulation Proficiency Scale–Third revision. Western Psychological Services. [Google Scholar]
- Gamer M, & Lemon J, & Puspendra Singh, I. F. (2019). irr: Various coefficients of interrater reliability and agreement (Version 0.84.1). R Foundation for Statistical Computing. https://CRAN.R-project.org/package=irr
- Gordon-Brannan, M., & Hodson, B. W. (2000). Intelligibility/severity measurements of prekindergarten children's speech. American Journal of Speech-Language Pathology, 9(2), 141–150. 10.1044/1058-0360.0902.141 [DOI] [Google Scholar]
- Gurevich, N., & Scamihorn, S. L. (2017). Speech-language pathologists' use of intelligibility measures in adults with dysarthria. American Journal of Speech-Language Pathology, 26(3), 873–892. 10.1044/2017_AJSLP-16-0112 [DOI] [PubMed] [Google Scholar]
- Hashemi Hosseinabad, H., Washington, K. N., Boyce, S. E., Silbert, N., & Kummer, A. W. (2022). Assessment of intelligibility in children with velopharyngeal insufficiency: The relationship between Intelligibility in Context Scale and experimental measures. Folia Phoniatrica et Logopaedica, 74(1), 17–28. 10.1159/000516537 [DOI] [PubMed] [Google Scholar]
- Hirsch, M. E., Thompson, A., Kim, Y., & Lansford, K. L. (2022). The reliability and validity of speech-language pathologists' estimations of intelligibility in dysarthria. Brain Sciences, 12(8), Article 1011. 10.3390/brainsci12081011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hodge, M., & Daniels, J. (2007). TOCS+ intelligibility measures. University of Alberta. [Google Scholar]
- Hodge, M. M., & Gotzke, C. L. (2011). Minimal pair distinctions and intelligibility in preschool children with and without speech sound disorders. Clinical Linguistics & Phonetics, 25(10), 853–863. 10.3109/02699206.2011.578783 [DOI] [PubMed] [Google Scholar]
- Hodge, M. M., & Gotzke, C. L. (2014). Construct-related validity of the TOCS measures: Comparison of intelligibility and speaking rate scores in children with and without speech disorders. Journal of Communication Disorders, 51, 51–63. 10.1016/j.jcomdis.2014.06.007 [DOI] [PubMed] [Google Scholar]
- Huang, A., Hall, K., Watson, C., & Shahamiri, S. R. (2021). A review of automated intelligibility assessment for dysarthric speakers. 2021 International Conference on Speech Technology and Human–Computer Dialogue (SpeD), 19–24. 10.1109/SpeD53181.2021.9587400 [DOI] [Google Scholar]
- Hustad, K. C. (2006). Estimating the intelligibility of speakers with dysarthria. Folia Phoniatrica et Logopaedica, 58(3), 217–228. 10.1159/000091735 [DOI] [PubMed] [Google Scholar]
- Hustad, K. C., & Cahill, M. A. (2003). Effects of presentation mode and repeated familiarization on intelligibility of dysarthric speech. American Journal of Speech-Language Pathology, 12(2), 198–208. 10.1044/1058-0360(2003/066) [DOI] [PubMed] [Google Scholar]
- Hustad, K. C., Mahr, T. J., Broman, A. T., & Rathouz, P. J. (2020). Longitudinal growth in single-word intelligibility among children with cerebral palsy from 24 to 96 months of age: Effects of speech-language profile group membership on outcomes. Journal of Speech, Language, and Hearing Research, 63(1), 32–48. 10.1044/2019_JSLHR-19-00033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hustad, K. C., Mahr, T. J., Natzke, P., & Rathouz, P. J. (2021). Speech development between 30 and 119 months in typical children I: Intelligibility growth curves for single-word and multiword productions. Journal of Speech, Language, and Hearing Research, 64(10), 3707–3719. 10.1044/2021_JSLHR-21-00142 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hustad, K. C., Oakes, A., & Allison, K. (2015). Variability and diagnostic accuracy of speech intelligibility scores in children. Journal of Speech, Language, and Hearing Research, 58(6), 1695–1707. 10.1044/2015_JSLHR-S-14-0365 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hustad, K. C., Sakash, A., Broman, A. T., & Rathouz, P. J. (2019). Differentiating typical from atypical speech production in 5-year-old children with cerebral palsy: A comparative analysis. American Journal of Speech-Language Pathology, 28(2S), 807–817. 10.1044/2018_AJSLP-MSC18-18-0108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hustad, K. C., Schueler, B., Schultz, L., & DuHadway, C. (2012). Intelligibility of 4-year-old children with and without cerebral palsy. Journal of Speech, Language, and Hearing Research, 55(4), 1177–1189. 10.1044/1092-4388(2011/11-0083) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiao, Y., LaCross, A., Berisha, V., & Liss, J. (2019). Objective intelligibility assessment by automated segmental and suprasegmental listening error analysis. Journal of Speech, Language, and Hearing Research, 62(9), 3359–3366. 10.1044/2019_JSLHR-S-19-0119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kent, R. D., Miolo, G., & Bloedel, S. (1994). The intelligibility of children's speech: A review of evaluation procedures. American Journal of Speech-Language Pathology, 3(2), 81–95. 10.1044/1058-0360.0302.81 [DOI] [Google Scholar]
- King, J. M., Watson, M., & Lof, G. L. (2012). Practice patterns of speech-language pathologists assessing intelligibility of dysarthric speech. Journal of Medical Speech-Language Pathology, 20(1), 1–10. [Google Scholar]
- Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. 10.1016/j.jcm.2016.02.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kwiatkowski, J., & Shriberg, L. D. (1992). Intelligibility assessment in developmental phonological disorders: Accuracy of caregiver gloss. Journal of Speech, Language, and Hearing Research, 35(5), 1095–1104. 10.1044/jshr.3505.1095 [DOI] [PubMed] [Google Scholar]
- Lee, J., Hustad, K. C., & Weismer, G. (2014). Predicting speech intelligibility with multiple speech subsystems approach in children with cerebral palsy. Journal of Speech, Language, and Hearing Research, 57(5), 1666–1678. 10.1044/2014_JSLHR-S-13-0292 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lundeborg, I., & McAllister, A. (2007). Treatment with a combination of intra-oral sensory stimulation and electropalatography in a child with severe developmental dyspraxia. Logopedics Phoniatrics Vocology, 32(2), 71–79. 10.1080/14015430600852035 [DOI] [PubMed] [Google Scholar]
- Mahr, T. J., & Hustad, K. C. (2023). Lexical predictors of intelligibility in young children's speech. Journal of Speech, Language, and Hearing Research, 66(8S), 3013–3025. 10.1044/2022_JSLHR-22-00294 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mahr, T. J., Rathouz, P. J., & Hustad, K. C. (2017). Longitudinal growth in intelligibility of connected speech from 2 to 8 years in children with cerebral palsy: A novel Bayesian approach. Journal of Speech, Language, and Hearing Research, 63(9), 2880–2893. 10.1044/2020_JSLHR-20-00181 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McHenry, M. (2011). An exploration of listener variability in intelligibility judgments. American Journal of Speech-Language Pathology, 20(2), 119–123. 10.1044/1058-0360(2010/10-0059) [DOI] [PubMed] [Google Scholar]
- Mcleod, S., & Baker, E. (2014). Speech-language pathologists' practices regarding assessment, analysis, target selection, intervention, and service delivery for children with speech sound disorders. Clinical Linguistics & Phonetics, 28(7–8), 508–531. 10.3109/02699206.2014.926994 [DOI] [PubMed] [Google Scholar]
- Mcleod, S., Harrison, L. J., & Mccormack, J. (2012). The intelligibility in context scale: Validity and reliability of a subjective rating measure. Journal of Speech, Language, and Hearing Research, 55(2), 648–656. 10.1044/1092-4388(2011/10-0130) [DOI] [PubMed] [Google Scholar]
- Mei, C., Reilly, S., Reddihough, D., Mensah, F., & Morgan, A. (2014). Motor speech impairment, activity, and participation in children with cerebral palsy. International Journal of Speech-Language Pathology, 16(4), 427–435. 10.3109/17549507.2014.917439 [DOI] [PubMed] [Google Scholar]
- Miller, N. (2013). Measuring up to speech intelligibility. International Journal of Language & Communication Disorders, 48(6), 601–612. 10.1111/1460-6984.12061 [DOI] [PubMed] [Google Scholar]
- Most, T., Ingber, S., & Heled-Ariam, E. (2012). Social competence, sense of loneliness, and speech intelligibility of young children with hearing loss in individual inclusion and group inclusion. The Journal of Deaf Studies and Deaf Education, 17(2), 259–272. 10.1093/deafed/enr049 [DOI] [PubMed] [Google Scholar]
- Mullen, R., & Schooling, T. (2010). The National Outcomes Measurement System for pediatric speech-language pathology. Language, Speech, and Hearing Services in Schools, 41(1), 44–60. 10.1044/0161-1461(2009/08-0051) [DOI] [PubMed] [Google Scholar]
- O'Leary, D., Lee, A., O'Toole, C., & Gibbon, F. (2021). Intelligibility in Down syndrome: Effect of measurement method and listener experience. International Journal of Language & Communication Disorders, 56(3), 501–511. 10.1111/1460-6984.12602 [DOI] [PubMed] [Google Scholar]
- Pennington, L., & McConachie, H. (2001). Interaction between children with cerebral palsy and their mothers: The effects of speech intelligibility. International Journal of Language & Communication Disorders, 36(3), 371–393. 10.1080/13682820110045847 [DOI] [PubMed] [Google Scholar]
- Pennington, L., Miller, N., Robson, S., & Steen, N. (2010). Intensive speech and language therapy for older children with cerebral palsy: A systems approach. Developmental Medicine and Child Neurology, 52(4), 337–344. 10.1111/j.1469-8749.2009.03366.x [DOI] [PubMed] [Google Scholar]
- Pennington, L., Parker, N. K., Kelly, H., & Miller, N. (2016). Speech therapy for children with dysarthria acquired before three years of age. Cochrane Database of Systematic Reviews, 2016(7), Article CD006937. 10.1002/14651858.CD006937.pub3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pennington, L., Roelant, E., Thompson, V., Robson, S., Steen, N., & Miller, N. (2013). Intensive dysarthria therapy for younger children with cerebral palsy. Developmental Medicine & Child Neurology, 55(5), 464–471. 10.1111/dmcn.12098 [DOI] [PubMed] [Google Scholar]
- R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/ [Google Scholar]
- Samar, V. J., & Metz, D. E. (1988). Criterion validity of speech intelligibility rating-scale procedures for the hearing-impaired population. Journal of Speech, Language, and Hearing Research, 31(3), 307–316. 10.1044/jshr.3103.307 [DOI] [PubMed] [Google Scholar]
- Schiavetti, N., Metz, D. E., & Sitler, R. W. (1981). Construct validity of direct magnitude estimation and interval scaling of speech intelligibility: Evidence from a study of the hearing impaired. Journal of Speech, Language, and Hearing Research, 24(3), 441–445. 10.1044/jshr.2403.441 [DOI] [PubMed] [Google Scholar]
- Skahan, S. M., Watson, M., & Lof, G. L. (2007). Speech-language pathologists' assessment practices for children with suspected speech sound disorders: Results of a national survey. American Journal of Speech-Language Pathology, 16(3), 246–259. 10.1044/1058-0360(2007/029) [DOI] [PubMed] [Google Scholar]
- Smith, C. H., Patel, S., Woolley, R. L., Brady, M. C., Rick, C. E., Halfpenny, R., Rontiris, A., Knox-Smith, L., Dowling, F., Clarke, C. E., Au, P., Ives, N., Wheatley, K., & Sackley, C. M. (2019). Rating the intelligibility of dysarthic speech amongst people with Parkinson's disease: A comparison of trained and untrained listeners. Clinical Linguistics & Phonetics, 33(10–11), 1063–1070. 10.1080/02699206.2019.1604806 [DOI] [PubMed] [Google Scholar]
- Stipancic, K. L., Tjaden, K., & Wilding, G. (2016). Comparison of intelligibility measures for adults with Parkinson's disease, adults with multiple sclerosis, and healthy controls. Journal of Speech, Language, and Hearing Research, 59(2), 230–238. 10.1044/2015_JSLHR-S-15-0271 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tjaden, K., Sussman, J. E., & Wilding, G. E. (2014). Impact of clear, loud, and slow speech on scaled intelligibility and speech severity in Parkinson's disease and multiple sclerosis. Journal of Speech, Language, and Hearing Research, 57(3), 779–792. 10.1044/2014_JSLHR-S-12-0372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- VideoLan. (2006). VLC media player. https://www.videolan.org/vlc/index.html [Google Scholar]
- Walshe, M., Miller, N., Leahy, M., & Murray, A. (2008). Intelligibility of dysarthric speech: Perceptions of speakers and listeners. International Journal of Language & Communication Disorders, 43(6), 633–648. 10.1080/13682820801887117 [DOI] [PubMed] [Google Scholar]
- Weismer, G., & Laures, J. S. (2002). Direct magnitude estimates of speech intelligibility in dysarthria: Effects of a chosen standard. Journal of Speech, Language, and Hearing Research, 45(3), 421–433. 10.1044/1092-4388(2002/033) [DOI] [PubMed] [Google Scholar]
- Yorkston, K. M., & Beukelman, D. R. (1978). A comparison of techniques for measuring intelligibility of dysarthric speech. Journal of Communication Disorders, 11(6), 499–512. 10.1016/0021-9924(78)90024-2 [DOI] [PubMed] [Google Scholar]
- Yorkston, K. M., Strand, E. A., & Kennedy, M. R. T. (1996). Comprehensibility of dysarthric speech: Implications for assessment and treatment planning. American Journal of Speech-Language Pathology, 5(1), 55–66. 10.1044/1058-0360.0501.55 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Anonymized data from this project are available on reasonable request from the first author.



