Abstract
Purpose
Gap duration contributes to the perception of utterances as fluent or disfluent, but few studies have systematically investigated the impact of gap duration on fluency judgments. The purposes of this study were to determine how gaps impact disfluency perception, and how listener background and experience impact these judgments.
Methods
Sixty participants (20 adults who stutter [AWS], 20 speech-language pathologists [SLPs], and 20 naïve listeners) listened to four tokens of the utterance, “Buy Bobby a puppy,” produced at typical speech rates. The gap duration between “Buy” and “Bobby” was systematically manipulated with gaps ranging from 23.59 ms to 325.44 ms. Participants identified stimuli as fluent or disfluent.
Results
The disfluency threshold – the point at which 50% of trials were categorized as disfluent – occurred at a gap duration of 126.46 ms, across all participants and tokens. The SLPs exhibited higher disfluency thresholds than the AWS and the naïve listeners.
Conclusion
This study determined, based on the specific set of stimuli used, when the perception of utterances tends to shift from fluent to disfluent. Group differences indicated that SLPs are less inclined to identify disfluencies in speech potentially because they aim to be less critical of speech that deviates from “typical.”
Keywords: fluency, disfluency, stuttering, gap duration
1. Introduction
Fluency refers to the uninterrupted, automatic, effortless, and continuous flow of speech (Fillmore, 1979; Starkweather, 1987). A breakdown in any of these characteristics could lead to speech being perceived as disfluent. For example, a short hesitation or pause during speech may contribute to the perception of disfluency by the listener. However, identifying fluent versus disfluent speech presents challenges because fluency operates on a continuum and is based on subjective assessment which is impacted by listener background and life experiences (Brundage et al., 2006; Cordes et al., 1992; Finn & Ingham, 1989). This has important research and clinical implications for identifying disorders such as stuttering, which can rely on the classification of fluent and disfluent speech. To begin to delimit how listeners perceive an utterance as fluent versus disfluent, the current study examined the impact of one parameter of fluency, gap duration, on perceptions of fluent versus disfluent speech within specific range and frequency parameters (e.g., Durlach & Braida, 1969; Parducci, 2012; Parducci et al., 1960). In addition, this study assessed the impact of listener background and experience on the perception of fluent versus disfluent speech.
1.1. What determines whether a listener perceives an utterance as fluent or disfluent?
Studies examining fluency in stuttering or in second language use, show that speech rate, stress timing, repairs, and gap duration and frequency (i.e., silent and filled pauses) are all linked to the perception of utterance fluency (Bosker et al., 2014; Cucchiarini et al., 2010; Duez, 1982, 1985; Goldman-Eisler, 1961; Krivokapić, 2007; Love & Jeffress, 1971; Trofimovich & Baker, 2006). For instance, studies of non-native speakers that examined gap duration and its impact on fluency typically examined many variables (e.g., interjections) in addition to gap duration, to determine which variables contribute to global fluency ratings (Bosker et al., 2014; Cucchiarini et al., 2010). Listeners take into account many, if not all of these characteristics when assessing whether the utterance they heard was fluent or disfluent (Bosker et al., 2014; Segalowitz, 2010). However, in the context of speech disorders such as stuttering, some of these characteristics may be more salient than others. Gap duration is a particularly salient variable in judging fluency, and listeners are able to perceive gaps easily (Few & Lingwall, 1972; Love & Jeffress, 1971; Martin & Strange, 1968; Prosek & Runyan, 1983). Gap duration refers to the length of time between syllables, represented by the absence (or low level) of acoustic energy (Lickley, 1994). As the overt features of stuttering include syllable repetitions and audible and inaudible prolongations of sounds, extended gaps in the speech of people who stutter (i.e., blocks) are inevitable. However, not all gaps contribute to the perception of disfluency. Rather, it is those gaps that disrupt the flow of speech that cause listeners to perceive speech as disfluent (Lövgren & Doorn, 2005). These gaps are referred to as hesitations or within-constituent pauses (Duez, 1982, 1985; Ruder & Jensen, 1972) and typically occur at non-syntactic boundaries (where pausing is intrinsically more variable). The current study focused on the impact of gap duration on perceptions of fluent and disfluent speech based on utterances produced at a typical speaking rate.
Most studies of the impact of gap duration on fluency perception focused on the observably “fluent” speech of speakers who stutter to determine whether pauses contributed to listeners judging entire speech samples as fluent or disfluent. Some studies found that listeners (including trained listeners) were unable to distinguish between the fluent speech of stuttering and non-stuttering speakers based on gap duration (Few & Lingwall, 1972; Krikorian & Runyan, 1983; Love & Jeffress, 1971). Other studies found listeners were able to distinguish between the fluent speech of adults who stutter (AWS) and adults who do not stutter (AWNS) based on gap duration (Brown & Colcord, 1987; Howell & Wingfield, 1990; Young, 1984). Studies of both stuttering and non-native speakers have typically included only gap durations greater than 200 ms, which is problematic because listeners are able to perceive gaps in speech below 150 ms (Campione & Véronis, 2002; Hieke et al., 1983). None of these studies systematically manipulated gap duration to determine the specific impact of gap duration on the perception of disfluency.
One study systematically manipulated gap duration. Lövgren and Doorn (2005) collected short speech samples containing natural silent pauses, from various news programs. These natural pauses were systematically manipulated resulting in four conditions: C1(un-manipulated) to C4 (longest durations). With four source utterances and four pause conditions per sample, this resulted in 16 samples. Mean gap durations, per condition, from the 16 speech samples were: C1 = 57 ms; C2 = 117 ms; C3 = 212 ms; C4 = 403 ms. Participants judged whether each speech sample was fluent or disfluent in a forced-choice task. Results indicated that C2 was judged disfluent 34% of the time, whereas C3 was judged disfluent 72% of the time. Lövgren and Doorn (2005) reported a gap duration threshold for fluency between 117 and 212 ms. This threshold could have been impacted by context (e.g., speech rate, prosody) and range effects, indicating that the threshold is specific to the gap durations included in that study (e.g., Durlach & Braida, 1969; Parducci, 2012; Parducci et al., 1960). However, the methodology was somewhat coarse-grained as using the mean gap duration of separate instances of pauses within a 15–20 s speech sample precludes examining the impact of specific gap durations on fluency perception. Thus, an examination of the impact of gap duration on fluency perception using a more fine-grained manipulation for a single, short utterance is warranted.
1.2. The Impact of Listener Background and Experience on Fluency Judgments
Different listener backgrounds and experience impact perceptions of disfluency. This is due to previous experience hearing different communication styles or impaired speakers (e.g., speakers who stutter), as well as to differences in professional training (e.g., naïve listeners vs. speech-language pathologists [SLPs]). For example, Cordes et al. (1992) studied the impact of differing amounts of clinical experience (i.e., undergraduate students, graduate students, experienced clinicians) on fluency perception. The undergraduate students’ previous experience with stuttering only included a few hours of lecture in an introductory speech and hearing course, which included viewing approximately one half-hour of videotaped people who stutter. The graduate student group included students from the speech-language pathology department. Experienced clinicians included clinicians and researchers with extensive experience in stuttering (i.e., a combination of doctoral students, clinical supervisors and professors specializing in stuttering). Cordes et al. (1992) reported that the graduate student group and the experienced clinician group exhibited similar increased agreement in identifying disfluencies, when compared to the undergraduate group. Additionally, the graduate students and the experienced clinicians identified more subtle stuttering events (i.e., stuttering events lasting less than 200 ms), versus the undergraduate group. These results signify that additional experience leads to changes in identifying disfluencies only when compared to a novice group, the undergraduate student group. Furthermore, in a comparison of students, speech-language pathologists (not stuttering experts), and highly experienced judges, Brundage et al. (2006) found that students and non-expert SLPs did not exhibit differences in intrajudge and interjudge agreement, and that both groups identified less than 50% as much stuttering as highly experienced judges. These findings indicated that consistency in stuttering judgments and accuracy in identifying stuttering only improves by judges who are experts in the field. The student group comprised university students enrolled in Communicative Sciences and Disorders classes; the majority of these students had not taken a class related to stuttering. The non-expert SLPs were American Speech-Language-Hearing Association (ASHA) certified SLPs and had a range of clinical experience, between 1.5–30 years. The experienced judges were deemed stuttering specialists they had a publication record in the area of stuttering spanning more than eight years (Cordes and Ingham, 1995). Interestingly, AWS are more likely to report disfluencies than AWNS (Lickley et al., 2005). For example, when AWS and AWNS listen to utterances containing single instances of disfluency, AWS are more likely to identify speech as disfluent, regardless of whether that speech is produced by an AWS or an AWNS. These findings suggest that the life experiences of AWS cause them to have more sensitive criteria for perceiving disfluencies in speech than AWNS. AWS may have increased motivation to listen to speech, versus other groups of listeners, which may have impacted their speech perception skills. Comparing gap duration thresholds between AWS, SLPs, and naïve listeners will elucidate the impact of listener background and experience on perceiving utterances as fluent or disfluent.
1.3. Purpose
The purposes of this study were 1) to estimate a gap duration threshold for disfluency perception (the point in time at which a gap typically leads to perceptions of disfluent speech), for material consisting of short utterances with typical speech rate, which contain one manipulated gap within a specific duration range (i.e., 23.59 ms - 325.44 ms), and 2) to determine the impact of listener experience and background on disfluency perception. As no previous studies have found a threshold for disfluency including the range and frequency of gaps used in the present study, our first research question was exploratory, and we did not hypothesize a gap duration threshold for disfluency. Concerning the second research question, we tested the impact of listener background and experience on fluency perception by comparing three groups: SLPs, AWS, and naïve listeners. Based on Lickley et al. (2005), we hypothesized that AWS would have the lowest gap threshold. Based on Brundage et al. (2006), we hypothesized that SLPs and naïve listeners would exhibit similar thresholds.
2. Methods
2.1. Participants
This study was approved by the Institutional Review Board at New York University (NYU). Participants provided informed consent to participate. Sixty participants ranging in age from 18 to 50-years-old (M=28.9, SD=6.0) comprised three groups: 1) 20 ASHA certified SLPs (19 women; mean age =33.6, SD=5.1); 2) 20 AWS (six women; mean age =28, SD=6.9); and 3) 20 naïve listeners defined as neurotypical adults who were not SLPs or AWS (10 women; mean age =25.8, SD=1.4). Gender ratios aimed to be reflective of gender ratios within each participant group. The SLP participant group was predominantly female as 96% of SLPs are female (ASHA, 2019), the AWS group was primarily male as AWS are predominantly male (Yairi et al., 1996), and the naïve group was evenly distributed between male and female participants. All participants were self-reported native speakers of Standard American English (i.e., learned it before age six and spoke it daily). Participants had typical hearing, confirmed by a hearing screening on the day of the experiment. Participants passed the hearing screening at 25 dB HL at three frequencies: 500Hz, 2000Hz, and 8000Hz, modeled after the standard NYU clinical hearing screening protocol. One participant did not respond to pure tones at 8000Hz during the screening, however, was still included in the experiment because the acoustic parameter of interest did not depend on the high frequencies. Participants reported a negative history of neurological and speech-language impairment, except for the subgroup of AWS who reported stuttering. The SLP participants were all ASHA certified speech-language pathologists whose years of clinical practice ranged from 2–23 years (M=9.0, SD=5.2). SLP participants had a variety of clinical experience with fluency disorders, however none specialized in stuttering; in addition, none were AWS. The AWS were recruited from the fourth author’s research database of stutterers, and all were previously diagnosed as AWS by a certified SLP who specializes in stuttering intervention. Naïve and SLP participants were recruited via word of mouth as well as flyers posted around the NYU campus.
2.2. Stimuli
Stimuli were generated using two fluent and two disfluent versions of the utterance, “Buy Bobby a puppy.” The original and unmanipulated utterances (i.e., tokens) were collected from a previous study (Jackson, Tiede, Beal, et al., 2016). Four tokens were used to increase generalizability of the findings. These tokens were selected in part because they exhibit a typical speech rate, comparable to the “normal rate” condition in Smith and Kleinow (2000; see their Figure 1). Oscillograms and spectrograms for the four tokens are provided in Appendix B. Token 1 and Token 2 were produced by a female AWS and a male AWNS, respectively. Token 3 and Token 4 were produced by the same female AWS and a different male AWNS. The disfluent tokens, Token 3 and Token 4, had a naturally occurring longer gap between “Buy” and “Bobby” than the fluent tokens, Token 1 and Token 2. The fluent and disfluent tokens were categorized by an SLP with five years of experience and expertise in stuttering intervention and confirmed by two additional SLPs. The gaps occurred between the words “Buy” and “Bobby,” and this position was used for all tokens. The gap began at the conclusion of the diphthong /aɪ/ in “Buy” and ending at the stop release for the first /b/ in “Bobby,” confirmed by no visible acoustic energy on the spectrogram. No other disfluencies occurred within the tokens besides the gap between the words “Buy” and “Bobby.” These tokens were manipulated by adding or deleting silence using Audacity® (Audacity Team, 1999). Length of gap duration was systematically manipulated in increments of 20 ms across the four tokens, except for the gap in Token 4, which was first decreased by 150 ms and then 20 ms in subsequent manipulations. The purpose of reducing the gap duration for Token 4 first by 150 ms was to increase the similarity in speech rate between Token 4 and Tokens 1–3. Gap times were increased for the fluent tokens and decreased for the disfluent tokens. Appendix A includes utterance durations and manipulated gap durations both as absolute values and as percentages of total utterance durations. Calculating gap duration as a percentage of the total utterance accounted for potential differences in speaker rate across samples.
2.3. Procedure
Each experiment lasted approximately 30 minutes. The experiment was self-paced and took place in a quiet location of the participants’ choosing (e.g., an office, their home) or in a research room in the NYU Communicative Sciences and Disorders Speech-Language Hearing Clinic. For the duration of the experiment, participants were seated in front of a MacBook Air 13-inch laptop computer (Apple). To minimize distraction, a plastic keyboard cover hid all laptop keys except for the “D” and “F” keys. Participants used the “D” laptop key to indicate if the utterance was disfluent and the “F” key to indicate fluent. The test procedure was created using the software PsychoPy v3.0.0 (Peirce & MacAskill, 2018). Before beginning the experiment, participants were verbally provided with the following instructions and definition of fluency:
Disfluency (versus fluency) can be reflected as pauses or hesitations in speech, for example, in the middle of a word or phrase. Disfluencies are sometimes obvious, sometimes they are very subtle. You will be hearing several versions of the same utterance, “Buy Bobby a puppy.” After you hear an utterance please decide if what you heard was fluent or disfluent by pressing the “D” or “F” keys on the laptop. These instructions will be on the screen when you begin. Do you have any questions?
Whereas other definitions of fluency typically include additional characteristics (e.g., automaticity, continuity, effort; Fillmore, 1979; Starkweather, 1987), the provided definition focused on pauses and hesitations, because gap duration was the only marker of disfluency, and the only parameter that was manipulated. The same instructions were provided on the computer screen for participants to read until the stimuli were presented. Participants completed five practice trials and then the stimuli were presented in a randomized order. For each trial, the different versions of “Buy Bobby a puppy” were presented and followed by the response screen (i.e., a grey screen with the white text “Disfluent (D) or Fluent (F)?”). The participant was prompted to select ‘D’ or ‘F’ on the laptop. Eight versions of the four original utterances (i.e., the original plus seven manipulations), were repeated ten times per participant for a total of 320 trials (i.e., 8×4×10). One naïve participant listened to only eight versions of each utterance totaling 256 trials due to a computer error. The trials were divided into four blocks, with a self-paced break offered in between each block. Participants had unlimited time to select their response, although the sound files could not be replayed.
2.4. Data Analysis
2.4.1. Disfluency threshold – Research Question 1
To examine the relationship between gap duration and fluency perception, calculated as percent [of trials perceived as] fluent, the Spearman correlation coefficient was determined. The variable, “percent fluent” was calculated per manipulation, per participant. For example, if a participant marked nine out of 10 trials fluent (versus disfluent) for the utterance containing a gap of 23.59 ms, the “percent fluent” for that particular utterance was 0.9. To determine the disfluency threshold, a logistic regression model was fit using the generalized linear model (glm) function in the lme4 package (Bates et al., 2014) in R (R core team, 2014). This model included data from all tokens, with absolute gap duration as the independent variable and response (i.e., fluent or disfluent) as the outcome variable. A logistic regression curve was plotted using the ggplot package (R core team, 2014; Wickham, 2016). The disfluency threshold was determined as the point at which the logistic regression curve crossed the 50th percentile found using the predict function with the findInt and uniroot numerical solvers (Brent, 1973). The disfluency threshold was calculated both for gap duration in absolute length and gap duration as a percent of the total utterance; the latter was used as a normalized metric to account for speech rate. The 50th percentile indicated that in 50% of trials, the utterance was categorized as disfluent. In addition, thresholds of disfluency with higher certainty (i.e., 80%, 90%, 95%) were calculated. To compare the impact of each incremental (20 ms) manipulation on fluency judgments, pairwise t-tests with Bonferroni correction were applied.
2.4.2. Impact of Listener Background and Experience – Research Question 2
Pairwise t-tests were used to compare responses between the AWS, the SLPs, and the naïve listeners within tokens. Additionally, disfluency thresholds were determined per group and per token using a glm model function with gap duration as the independent variable and response (i.e., fluent or disfluent) as the outcome variable. The predict function with the findInt and uniroot numerical solvers was used to determine disfluency thresholds (Brent, 1973).
Group differences across tokens were compared using the generalized linear mixed model (glmer) function in the lme4 package (Bates et al., 2014), and p-values were calculated using the ImerTest package using the Satterwaithe approximation for the degrees of freedom of the T distribution (Kuznetsova et al., 2017) for R. A mixed-effects logistic regression model was chosen to further assess group differences as it allows for the test of several variables on a binary outcome variable (i.e., fluent or disfluent) while incorporating fixed and random effects. Our model included response (i.e., fluent or disfluent) as the outcome variable with absolute gap duration and group as fixed factors. We used a manual stepwise regression procedure to select the fixed effects of our model. This model fitting procedure involves testing each variable in an additive manner and comparing Akaike information criterion (AIC) values. Random intercepts were included for the variables participant and token as both are repeated measures within our data (Harel & McAllister, 2019). Including participant and token as random factors accounts for within-participant and within-token variability. Additionally, the model fit was estimated by calculating R-squared values (Nakagawa et al., 2017) using the r.squaredGLMM function in the MuMIn package (Bartoń, 2020). Group comparisons included AWS versus SLPs, AWS versus Naives, and Naives versus SLPs, using a releveling procedure. Lastly, the kappa statistic was used for each group to test interrater reliability of fluency judgments between groups (McHugh, 2012). See supplemental material for statistical analyses.
3. Results
3.1. Disfluency threshold – Research Question 1
Unsurprisingly, there was a negative correlation between gap duration and percent designated as fluent across all tokens (r = −.688, p < .001). The 50% disfluency threshold occurred at a gap duration of 126.46 ms (or 11.52% of the total utterance) across all participants and tokens, as shown in Figure 1. This disfluency threshold indicates a gap duration at which 50% of the utterances were categorized as disfluent. The 80% fluent and disfluent thresholds were indicated by gap durations of 67.41 ms (6.66% of the total utterance) and 185.50 ms (16.38% of the total utterance), respectively. The 90% fluent and disfluent thresholds were indicated by gap durations of 32.88 ms (3.82% of the total utterance) and 220.04 ms (19.21% of the total utterance), respectively. The 95% fluent and disfluent thresholds were indicated by gap durations of 24.14 ms (2.57% of the total utterance) and 251.87 ms (21.83% of the total utterance), respectively.
Figure 1.

Logistic regression curve based on the generalized linear model (glm) determined by percentages of utterances selected as fluent against gap duration (ms) across all participants and tokens. Horizontal line at the 50th percentile for fluency/disfluency and vertical dotted line at intercept between the curve and 50th percentile to visualize disfluency threshold. Intercept value, or disfluency threshold, shown in figure in milliseconds and as a percent of the total the utterance.
Significant differences were primarily found in responses (fluent/disfluent) between consecutive gap manipulations at or around our proposed disfluency threshold (i.e., the point in time where judgements tended to shift from fluent to disfluent (Table 1). For Token 1, significant differences were found between consecutive gap manipulations 3 (63.59 ms) and 4 (83.59 ms, t = 4.76, p = .001, 95% CI [0.107, 0.260]), 4 and 5 (103.59 ms, t = 3.06, p = .010, 95% CI [0.055, 0.255]), and 5 and 6 (123.59 ms; t = 2.81, p = .009, 95% CI [0.046, 0.266]), after Bonferroni correction for multiple comparisons. For Token 2, a significant difference was observed between consecutive manipulations 4 (98.2 ms) and 5 (118.2 ms; t = 2.86, p = .100, 95% CI [0.045, 0.248]). For Token 3, a significant difference was found between gap manipulation 3 (145.33 ms) and 4 (125.33 ms; t = 2.97, p = .009, 95% CI [0.052, 0.262]). For Token 4, significant differences were found between manipulation 2 (175.44 ms) and 3 (125.44 ms; t = 4.07, p = .001, 95% CI [0.049, 0.210]) and 5 (85.44 ms) and 6 (65.44 ms; t = 3.14, p = .006, 95% CI [0.112, 0.221]).
Table 1.
Results from pairwise t-tests with Bonferroni correction comparing the impact of each incremental (20 ms) gap manipulation on disfluency judgements. Base manipulation indicates the original, unaltered, utterance production. Asterisk indicates significant differences (p < .05) in responses, between manipulations.
| Token | Manipulation | Gap Duration (ms) | Response differences between manipulations | 95% CI |
|---|---|---|---|---|
| 1 | Base | 23.59 | ||
| 2 | 43.59 | 1 – 2: t = 2.03, p = 1 | [0.001, 0.730] | |
| 3 | 63.59 | 2 – 3: t = 2.11, p = 1 | [0.003, 0.104] | |
| 4 | 83.59 | 3 – 4: t = 4.76, p = .001* | [0.107, 0.260] | |
| 5 | 103.59 | 4 – 5: t = 3.06, p = .010 * | [0.055, 0.255] | |
| 6 | 123.59 | 5 – 6: t = 2.81, p = .009 * | [0.046, 0.266] | |
| 7 | 143.59 | 6 – 7: t = 0.07, p = 1 | [0.046, 0.266] | |
| 8 | 163.59 | 7 – 8: t = 1.91, p = .719 | [−0.106, 0.114] | |
| 2 | Base | 38.2 | ||
| 2 | 58.2 | 1 – 2: t = 1.10, p = 1 | [−0.031, 0.108] | |
| 3 | 78.2 | 2 – 3: t = 1.65, p = 1 | [−0.013, 0.148] | |
| 4 | 98.2 | 3 – 4: t = 1.88 p = 1 | [−0.005, 0.181] | |
| 5 | 118.2 | 4 – 5: t = 2.86, p = .100 * | [0.045, 0.248] | |
| 6 | 138.2 | 5 – 6: t = 2.34, p = .670 | [0.019, 0.232] | |
| 7 | 158.2 | 6 – 7: t = 2.10, p = 1 | [0.006, 0.224] | |
| 8 | 178.2 | 7 – 8: t = 1.70, p = 1 | [−0.014, 0.179] | |
| 3 | Base | 185.33 | ||
| 2 | 165.33 | 1 – 2: t = 1.95, p = .778 | [−0.002, 0.192] | |
| 3 | 145.33 | 2 – 3: t = 0.71, p = 1 | [−0.068, 0.143] | |
| 4 | 125.33 | 3 – 4: t = 2.97, p = .009 * | [0.052, 0.262] | |
| 5 | 105.33 | 4 – 5: t = 2.23, p = .246 | [0.013, 0.215] | |
| 6 | 85.33 | 5 – 6: t = 3.38, p = .018 * | [0.062, 0.236] | |
| 7 | 65.33 | 6 – 7: t = 2.78, p = 1 | [0.025, 0.149] | |
| 8 | 45.33 | 7 – 8: t = 3.02, p = 1 | [0.019, 0.091] | |
| 4 | Base | 325.44 | ||
| 2 | 175.44 | 1 – 2: t = 3.57, p = .398 | [−0.003, 0.120] | |
| 3 | 125.44 | 2 – 3: t = 4.07, p = .001 * | [0.049, 0.210] | |
| 4 | 105.44 | 3 – 4: t = 2.21, p = .165 | [0.058, 0.256] | |
| 5 | 85.44 | 4 – 5: t = 1.55, p = 1 | [−0.023, 0.187] | |
| 6 | 65.44 | 5 – 6: t = 3.14, p = .006 * | [0.112, 0.221] | |
| 7 | 45.44 | 6 – 7: t = 3.17, p = .065 | [0.095, 0.275] | |
| 8 | 25.44 | 7 – 8: t = 1.87, p = 1 | [0.046, 0.161] |
Note. CI = confidence interval.
3.2. Group differences – Research Question 2
As shown in Table 2, the within-token t-tests revealed significant differences between SLPs and AWS for three out of four tokens, and differences between naïve listeners and SLPs for two out of four tokens. No differences were observed between the AWS and naïve groups. The SLPs consistently had the highest thresholds for disfluency and the AWS had the lowest thresholds (Table 2). Figure 2 depicts the means of percentages of items selected fluent for each manipulation against gap duration for each token for each group with error bars representing standard error of the mean. Visual inspection of the data indicated that SLPs required longer gap times to mark utterances as disfluent in all tokens aside from Token 2, where responses were similar between groups.
Table 2.
Threshold for disfluency for each group (i.e., AWS, naïve, SLP) and results from pairwise t-tests comparing groups. Disfluency threshold determined by the intersection between logistic regression curve and 50th fluency percentile, separated by token. Fitted logistic regression curve was generated with gap duration as the independent variable and response (i.e., fluent or disfluent) as the outcome variable. T-tests were conducted, per token, comparing responses between groups. Asterisk indicates significant differences (p < .05) in responses, between groups.
| Token 1 | Token 2 | Token 3 | Token 4 | |
|---|---|---|---|---|
| AWS | 113.15 ms | 120.44 ms | 130.25 ms | 97.97 ms |
| AWS vs. Naïve | t = 0.31, p = .755 | t = −0.81, p = .419 | t = −0.37, p = .715 | t = −0.94, p = .350 |
| 95% CI | [−0.063, 0.087] | [−0.105, 0.044] | [−0.876, 0.060] | [−0.114, 0.040] |
| Naive | 114.37 ms | 127.35 ms | 134.12 ms | 112.07 ms |
| Naive vs. SLP | t = −2.24, p = .011* | t = −0.68, p = .499 | t = −2.61, p = .009* | t = −1.91, p = .057 |
| 95% CI | [0.022, 0.176] | [−0.050, 0.103] | [0.024, 0.172] | [−0.002, 0.158] |
| SLP | 129.87 ms | 134.27 ms | 154.63 ms | 131.55 ms |
| SLP vs. AWS | t = 2.24, p = .026* | t = 1.44, p = .151 | t = 2.94, p = .004* | t = 2.76, p = .006* |
| 95% CI | [0.011, 0.163] | [−0.021, 0.135] | [0.037, 0.187] | [0.033, 0.196] |
Note. CI = confidence interval.
Figure 2.

Means of percentages of items selected fluent against gap duration (ms) for each token per participant group. Error bars represent standard error of the mean. Horizontal line at the 50th percentile for fluency/disfluency.
The mixed model, which accounted for within participant and token variability, revealed that the SLPs marked fewer utterances as disfluent than the AWS as gap duration increased (, z = 2.047, p = .042, 95% CI [0.017, 1.552]). There was no evidence of a difference between naïve listeners and the AWS group (, z = 0.615, p = .538, 95% CI [−0.518, 1.016]) as well as the SLP group (, z = 1.431, p = .152, 95% CI [−0.518, 1.016]). Regarding model fit, our model led to a marginal R-squared value of 0.391 and a conditional R-squared value of 0.590.
Regarding consistency within groups, Cohen’s kappa indicated that AWS exhibited the greatest consistency in fluency judgements (kappa = 0.54) as compared with the naïve group (kappa = 0.45) and SLPs (kappa = 0.44). The kappa values for all three groups indicate a moderate level of agreement as they fall between 0.41–0.60 (McHugh, 2012).
4. Discussion
The purposes of this study were to determine a gap duration threshold for when speech perception is likely to shift from fluent to disfluent within a particular set of stimuli, and to assess the impact of listener background and experience on this threshold. Judgments were made based on short utterances with typical speech rates containing gaps with range and frequency parameters specific to this study. Disfluency perception operates on a continuum, however across groups and tokens, judgments tended to change from fluent to disfluent at a gap duration of 126.46 ms or 11.52% of the total duration of the utterance. The SLPs exhibited higher disfluency thresholds than the AWS. There was some evidence that the naïve group exhibited lower thresholds than SLPs, but this result was token-dependent and impacted by within-participant variation. These findings are discussed in greater detail below.
4.1. Disfluency threshold – Research Question 1
Gap duration, particularly at non-syntactic boundaries, is a critical indicator of whether speech is perceived as fluent or disfluent. The current study systematically manipulated gap duration of a single utterance and was therefore able to estimate a point in time, based on the current set of stimuli, at which judgments tended to shift from fluent to disfluent. We found that utterances containing gaps of approximately 126 ms or greater are likely to be perceived as disfluent. In the only previous study to manipulate gap duration, Lövgren and Doorn (2005) found that short speech samples with pauses ranging from 98 – 479 ms were judged to be disfluent significantly more than speech samples with pauses ranging from 27 – 100 ms. While Lövgren and Doorn (2005) systematically manipulated gap duration, fluency judgments were based on entire speech samples (15–20 s in length), within which there were gaps of differing durations. Instead, the utterances in the present study only contained one gap, allowing for a more specific disfluency threshold to be proposed.
Interestingly, 20 ms increases in gap duration, specifically those in close proximity to gap duration thresholds, had significant impacts on disfluency judgments. For example, a shift from a gap duration of 103.59 ms to 123.59 ms for Token 1 contributed to significantly more disfluent judgments, but a shift from a gap duration of 143.59 ms to 163.59 ms did not contribute to significantly more disfluent judgments. Thus, utterances containing gaps close to the proposed threshold are more ambiguous, which is in line with previous work on categorical perception (Rosen & Howell, 1981). It may be that utterances with gap durations near the disfluency threshold, as identified in the current study, represent tenuous fluency (Adams & Runyan, 1981), or speech that appears fluent but is on the cusp of breaking down and may only be identifiable using physiological measurements (e.g., motion tracking, acoustic measurements). Adding a threshold as that determined in the current study may improve investigators’ ability to identify instances of tenuous fluency.
Importantly, fluency judgments are driven by context, and therefore, it is unlikely that there is a fixed point threshold when judgments shift from fluent to disfluent. For example, changes in speaking rate may impact fluency judgments. The tokens in the current study were in part chosen because they reflect a typical speaking rate (i.e., approximately 6 syllables/second). These rates are comparable to the “normal” rate of the same utterance (“Buy Bobby a puppy”) used in Smith and Kleinow (2000). Further, we included gap duration as a percentage of utterance duration to account for the subtle changes in speech rate across the utterances. These results mirrored those for the absolute measure, suggesting that disfluency perception in the present study may be based on absolute time and not dependent upon speech rate. Future studies could examine greater ranges of speech rate or duration – for example, “slow” and “fast” rates as in Smith and Kleinow (2000).
In addition, the disfluency threshold determined here only applies to the range and frequency of gap durations used in the stimuli. That is, a different range of gap durations may have yielded a different gap duration threshold. It should be noted, however, that range effects observed in previous studies have been due to responses being at the center of the range and impacted by the order of stimuli presented (e.g., Bailey et al., 1977; Brady & Darwin, 1978). Our trials were presented randomly and the disfluency threshold was not in the middle of the range of gap durations (i.e., median= 104.46 ms). Further, while all utterance versions were presented the same number of times, disfluency perception in natural, spontaneous speech could be influenced by frequency effects. Pauses in spontaneous speech will not have the same frequency distribution for durations as in the current study and this could affect threshold estimates (e.g., Durlach & Braida, 1969; Parducci, 2012; Parducci et al., 1960). The threshold determined in the current study may not reflect a threshold determined during spontaneous speech. Additionally, the gap in the current study occurred in the same place between the same two words for all stimuli (i.e., the gap began at the diphthong /aɪ/ in “Buy” and ended at the stop release for the first /b/ in “Bobby”). Future work should examine the perception of disfluency including gaps of differing quantities and durations, as well as examine gaps between other types of phonemes or gaps in additional word positions and within different utterances.
The threshold proposed in this study could be used in conjunction with other parameters of fluency (e.g., speech rate, prosody) as a step in understanding whether speech is likely to be perceived as fluent or disfluent. When connected speech with gap durations near the disfluency threshold (i.e., 126 ms) are included in research, the speech samples should be examined with increased attention as fluency judgments may shift around gaps of this duration. Utterances with gaps around 126 ms are not likely to be unanimously perceived as fluent or disfluent. For example, this threshold is relevant for research that compares the speech motor skills of AWS and AWNS which typically excludes disfluent speech from analysis (e.g., Few & Lingwall, 1972; Healey & Ramig, 1986; Love & Jeffress, 1971; Peters et al., 1989; Smith et al., 2010; Young, 1964; Zimmermann & Hanley, 1983). Objective measures of disfluency are especially important for research examining speech variability over repeated trials, because variations in the timing of trials, such as insertions of pauses, can impact variability measures (e.g., Jackson, Tiede, Riley, et al., 2016; Kleinow & Smith, 2000; Lucero, 2005). Studies with the goal of examining subtle speech differences between AWS and AWNS could possibly benefit from considering the gap duration thresholds proposed in the current study.
4.2. Group differences – Research Question 2
The second aim of this study was to determine if listeners with different backgrounds differ in their perceptions of disfluency. The SLPs exhibited higher disfluency thresholds than the AWS, and there was some evidence that they exhibited higher thresholds than the naïve listeners. It appears that the professional training of SLPs does not make them more likely to report speech disfluencies. This is a curious finding. The SLP participants had a range of years of clinical experience, and none of them specialized in fluency disorders. Our findings are consistent with Brundage et al. (2006), which showed that only SLPs with expertise in fluency disorders are more skilled at identifying disfluencies. The focus of SLP education is generally on atypical or stutter-like disfluencies, as opposed to typical disfluencies. This focus on atypical disfluencies could have led generalist SLPs to disregard short pauses in speech, despite the task instructions indicating that disfluency can include quick and subtle pauses in speech. An alternative explanation is that SLPs are accustomed to encouraging people who stutter for example, by being less critical of disfluencies. The SLPs in the current study could have been striving to be less critical of speech that deviates from “typical” speech, perhaps due to the integration of counseling and empathy-building in fluency disorders courses. On the other hand, AWS and naïve listeners are not trained or do not have this background. AWS are even often overly critical of themselves and their speech (Lickley et al., 2005).
Regarding the differences between the SLPs and the naïve listeners, the tokens for which there were differences between groups were produced by a female speaker (Token 1/Token 3), whereas the tokens for which there were no differences were both produced by male speakers (Token 2/Token 4). It may be that gender contributed to the differences, or that specific characteristics of the female speaker contributed to the differences. Given the limited number of speakers, and that the only parameter manipulated was gap duration, we cannot answer this question. Future work could increase the number of stimuli by using more speakers, particularly male speakers as the majority of adult stutterers are male, as well as manipulate other parameters (e.g., prosody) to determine their impact on fluency perception.
4.3. Considerations
Four discrete tokens were used, therefore it is possible that other speech parameters, in addition to gap duration, contributed to the participants’ perception of disfluency. Two tokens were produced by an AWS and two by AWNS, and two tokens were produced by a male and two by a female speaker. Fundamental frequency or resonances of the speakers could have contributed to differences in responses between tokens. Additionally, because the majority of AWS are male, participants may be more accustomed to hearing disfluencies in the speech of males, which may have impacted responses. Further, although the tokens contained neutral stress patterns, gaps in speech can be perceived as a characteristic of stress (Duez, 1985; Lickley, 1994) and it is possible that participants interpreted gaps as markers of stress as opposed to markers of disfluency. Specifically, participants could have perceived the gap between “Buy” and “Bobby” to indicate the following word (i.e., “Bobby”) as stressed versus identifying the utterance as disfluent. Future research could examine other parameters of fluency (e.g., speech rate, prosody). For example, prosody could be systematically manipulated while keeping other parameters of fluency consistent (e.g., gap duration, speech rate), to determine how changes in prosody impact perception of short utterances as fluent or disfluent.
Lastly, the provided definition of fluency in the instructions may have differentially impacted the groups. The instructions aimed to draw participant’s attention to pauses or hesitations in speech as gap duration was the only marker of disfluency within the stimuli, as well as the only manipulated variable. However, due to their personal and professional experiences, the AWS or SLPs may have judged the utterances based on pre-existing definitions of fluency, whereas the naïve group may have judged the utterances based on the provided definition of fluency. It is therefore possible that the differences found between groups were impacted by listeners applying different guidelines for fluency and disfluency.
5. Conclusion
This study estimated a threshold for the perception of disfluency based on short utterances spoken at typical rates, which contained manipulated gaps within a study-specific range. This threshold approximates when the perception of these utterances shifted from fluent to disfluent, based on gap duration. SLPs exhibited the highest disfluency thresholds indicating that generalist SLPs are less inclined than naïve listeners or AWS to identify disfluencies in speech. Future work could include greater variation in gap durations and speakers to further study the impact of context on the perception of disfluency.
Supplementary Material
Highlights.
We report a threshold for disfluency perception based on a fixed set of stimuli with equal frequency, for utterances with typical speech rates.
Adults who stutter have a lower disfluency threshold for these materials than speech-language pathologists.
Background and life experience impacts the perception of disfluency.
Acknowledgments:
The authors thank all of the individuals who participated in this study. This work was funded in part by NIH grant DC-002717 to Haskins Laboratories.
Bios
Haley J. Warner is a doctoral student in the stuttering and vvariability (savvy) lab at New York University and a speech-language pathologist. She received her master’s degree in Communicative Sciences and Disorders from New York University.
D.H. Whalen is a Distinguished Professor of Speech-Language-Hearing Sciences at the City University of New York’s Graduate Center. He is also affiliated with the Linguistics Departments of CUNY and of Yale University. At the independent research institute Haskins Laboratories in New Haven, CT, he is Vice-President of Research.
Daphna Harel is an Associate Professor of Applied Statistics at the Department of Applied Statistics, Social Science, and Humanities at New York University. Her research focuses on modeling and measurement in a variety of health-related fields as well as the social sciences.
Eric S. Jackson, PhD, CCC-SLP is an Assistant Professor and Director of the stuttering and vvariability (savvy) lab at NYU. His research program investigates the contextual variability of stuttering and the factors that drive variability, including social interaction and anticipation. Dr. Jackson is also a speech-language pathologist with more than 10 years of experience.
Appendix A. Stimuli, duration of gap between “Buy” and “Bobby.” With un-altered/original productions highlighted in grey.
| Token | Manipulation number | Sentence Duration (ms) | Gap Duration (ms) | Gap Duration (% of utterance) |
|---|---|---|---|---|
| 1 | 1 | 930 | 23.59 | 2.54% |
| 1 | 2 | 950 | 43.59 | 4.59% |
| 1 | 3 | 970 | 63.59 | 6.56% |
| 1 | 4 | 990 | 83.59 | 8.44% |
| 1 | 5 | 1010 | 103.59 | 10.26% |
| 1 | 6 | 1030 | 123.59 | 12.00% |
| 1 | 7 | 1050 | 143.59 | 13.68% |
| 1 | 8 | 1070 | 163.59 | 15.29% |
| 2 | 1 | 1053.69 | 38.2 | 3.63% |
| 2 | 2 | 1073.69 | 58.2 | 5.42% |
| 2 | 3 | 1093.69 | 78.2 | 7.15% |
| 2 | 4 | 1113.69 | 98.2 | 8.82% |
| 2 | 5 | 1133.69 | 118.2 | 10.43% |
| 2 | 6 | 1153.69 | 138.2 | 11.98% |
| 2 | 7 | 1173.69 | 158.2 | 13.48% |
| 2 | 8 | 1193.69 | 178.2 | 14.93% |
| 3 | 1 | 1139.33 | 185.33 | 16.27% |
| 3 | 2 | 1119.33 | 165.33 | 14.77% |
| 3 | 3 | 1099.33 | 145.33 | 13.22% |
| 3 | 4 | 1079.33 | 125.33 | 11.61% |
| 3 | 5 | 1059.33 | 105.33 | 9.94% |
| 3 | 6 | 1039.33 | 85.33 | 8.21% |
| 3 | 7 | 1019.33 | 65.33 | 6.41% |
| 3 | 8 | 999.33 | 45.33 | 4.54% |
| 4 | 1 | 1276.53 | 325.44 | 25.49% |
| 4 | 2 | 1126.53 | 175.44 | 15.57% |
| 4 | 3 | 1076.53 | 125.44 | 11.65% |
| 4 | 4 | 1056.53 | 105.44 | 9.98% |
| 4 | 5 | 1036.53 | 85.44 | 8.24% |
| 4 | 6 | 1016.53 | 65.44 | 6.44% |
| 4 | 7 | 996.53 | 45.44 | 4.56% |
| 4 | 8 | 976.53 | 25.44 | 2.61% |
Appendix B. Spectrograms and Oscillograms for each un-manipulated utterance (i.e., “Buy Bobby a Puppy”).

Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Adams MR, & Runyan CM (1981). Stuttering and fluency: Exclusive events or points on a continuum? Journal of Fluency Disorders, 6(3), 197–218. [Google Scholar]
- American Speech-Language-Hearing Association. (2019). A demographic snapshot of SLPs. The ASHA Leader, 24(7). [Google Scholar]
- Bailey PJ, Summerfield Q, & Drooman M (1977). On the identification of sine-wave analogues of certain speech sounds. Haskins Laboratories Status Report, SR-51/52, 1–25. [Google Scholar]
- Bartoń K (2020). MuMIn: Multi-Model Inference (R package version 1.43.17) [Computer software] https://CRAN.R-project.org/package=MuMIn
- Bates D, Mächler M, Bolker B, & Walker S (2014). Fitting Linear Mixed-Effects Models using lme4. ArXiv:1406.5823 [Stat] http://arxiv.org/abs/1406.5823 [Google Scholar]
- Bosker HR, Quené H, Sanders T, & De Jong NH (2014). The perception of fluency in native and nonnative speech. Language Learning, 64(3), 579–614. [Google Scholar]
- Brady SA, & Darwin CJ (1978). Range effect in the perception of voicing. Journal of the Acoustical Society of America, 63, 1556–1558. [DOI] [PubMed] [Google Scholar]
- Brent R (1973). Algorithms for Minimization without Derivatives.
- Brown SL, & Colcord RD (1987). Perceptual comparisons of adolescent stutterers’ and nonstutterers’ fluent speech. Journal of Fluency Disorders, 12(6), 419–427. [Google Scholar]
- Brundage SB, Bothe AK, Lengeling AN, & Evans JJ (2006). Comparing judgments of stuttering made by students, clinicians, and highly experienced judges. Journal of Fluency Disorders, 31(4), 271–283. [DOI] [PubMed] [Google Scholar]
- Campione E, & Véronis J (2002). A large-scale multilingual study of silent pause duration. Speech Prosody 2002, International Conference. [Google Scholar]
- Cordes AK, & Ingham RJ (1995). Judgments of Stuttered and Nonstuttered Intervals by Recognized Authorities in Stuttering Research. Journal of Speech, Language, and Hearing Research, 38(1), 33–41. [DOI] [PubMed] [Google Scholar]
- Cordes AK, Ingham RJ, Frank P, & Ingham JC (1992). Time-Interval Analysis of Interjudge and Intrajudge Agreement for Stuttering Event Judgments. Journal of Speech, Language, and Hearing Research, 35(3), 483–494. [DOI] [PubMed] [Google Scholar]
- Cucchiarini C, Doremalen J. van, & Strik H. (2010). Fluency in non-native read and spontaneous speech. DiSS-LPSS Joint Workshop 2010. [Google Scholar]
- Duez D (1982). Silent and non-silent pauses in three speech styles. Language and Speech, 25(1), 11–28. [DOI] [PubMed] [Google Scholar]
- Duez D (1985). Perception of silent pauses in continuous speech. Language and Speech, 28(4), 377–389. [DOI] [PubMed] [Google Scholar]
- Durlach NI, & Braida LD (1969). Intensity Perception. I. Preliminary Theory of Intensity Resolution. The Journal of the Acoustical Society of America, 46(2B), 372–383. [DOI] [PubMed] [Google Scholar]
- Few LR, & Lingwall JB (1972a). A further analysis of fluency within stuttered speech. Journal of Speech and Hearing Research, 15(2), 356–363. [DOI] [PubMed] [Google Scholar]
- Few LR, & Lingwall JB (1972b). A Further Analysis of Fluency within Stuttered Speech. Journal of Speech and Hearing Research, 15(2), 356–363. [DOI] [PubMed] [Google Scholar]
- Fillmore C (1979). On fluency. Individual Differences in Language Ability and Language Behavior, 81, 85–102. [Google Scholar]
- Finn P, & Ingham RJ (1989). The Selection of “Fluent” Samples in Research on Stuttering: Conceptual and Methodological Considerations. Journal of Speech, Language, and Hearing Research, 32(2), 401–418. [DOI] [PubMed] [Google Scholar]
- Goldman-Eisler F (1961). The distribution of pause durations in speech. Language and Speech, 4(4), 232–237. [Google Scholar]
- Harel D, & McAllister T (2019). Multilevel Models for Communication Sciences and Disorders. Journal of Speech, Language, and Hearing Research, 62(4), 783–801. [DOI] [PubMed] [Google Scholar]
- Healey C, & Ramig P (1986). Acoustic Measures of Stutterers’ and Nonstutterers’ Fluency in Two Speech Contexts. Journal of Speech Language and Hearing Research, 29(3). [DOI] [PubMed] [Google Scholar]
- Hieke AE, Kowal S, & O’Connell DC (1983). The trouble with “articulatory” pauses. Language and Speech, 26(3). [Google Scholar]
- Howell P, & Wingfield T (1990). Perceptual and acoustic evidence for reduced fluency in the vicinity of stuttering episodes. Language and Speech, 33(1), 31–46. [DOI] [PubMed] [Google Scholar]
- Jackson ES, Tiede M, Beal D, & Whalen DH (2016). The Impact of Social–Cognitive Stress on Speech Variability, Determinism, and Stability in Adults Who Do and Do Not Stutter. Journal of Speech Language and Hearing Research, 59(6), 1295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jackson ES, Tiede M, Riley MA, & Whalen DH (2016). Recurrence Quantification Analysis of Sentence-Level Speech Kinematics. Journal of Speech, Language, and Hearing Research, 59(6), 1315–1326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kleinow J, & Smith S (2000). Influences of Length and Syntactic Complexity on the Speech Motor Stability of the Fluent Speech of Adults Who Stutter. Journal of Speech, Language, and Hearing Research, 43(2), 548–559. [DOI] [PubMed] [Google Scholar]
- Krikorian CM, & Runyan CM (1983). A perceptual comparison: Stuttering and nonstuttering children’s nonstuttered speech. Journal of Fluency Disorders, 8(4), 283–290. [Google Scholar]
- Krivokapić J (2007). Prosodic planning: Effects of phrasal length and complexity on pause duration. Journal of Phonetics, 35(2), 162–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuznetsova A, Brockhoff PB, & Christensen R (2017). ImerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software, 82(13), 1–26. [Google Scholar]
- Lickley RJ (1994). Detecting disfluency in spontaneous speech [Ph.D., University of Edinburgh] http://hdl.handle.net/1842/21358
- Lickley RJ, Hartsuiker RJ, Corley M, Russell M, & Nelson R (2005). Judgment of Disfluency in People who Stutter and People who do not Stutter: Results from Magnitude Estimation. Language and Speech, 48(3), 299–312. [DOI] [PubMed] [Google Scholar]
- Love LR, & Jeffress LA (1971). Identification of Brief Pauses in the Fluent Speech of Stutterers and Nonstutterers. Journal of Speech and Hearing Research, 14(2), 229–240. [DOI] [PubMed] [Google Scholar]
- Lövgren T, & Doorn JV (2005). Influence of manipulation of short silent pause duration on speech fluency. Proc. DISS2005, 123–126. [Google Scholar]
- Lucero JC (2005). Comparison of Measures of Variability of Speech Movement Trajectories Using Synthetic Records. Journal of Speech, Language, 48(2), 336–344. [DOI] [PubMed] [Google Scholar]
- MacAskill MR, & Peirce JW (2018). Building Experiments in PsychoPy. Sage. [Google Scholar]
- Martin JG, & Strange W (1968). The perception of hesitation in spontaneous speech. Perception & Psychophysics, 3(6), 427–438. [Google Scholar]
- McHugh ML (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22(3), 276–282. [PMC free article] [PubMed] [Google Scholar]
- Nakagawa S, Johnson PCD, & Schielzeth H (2017). The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parducci A (1965). Category judgment: A range-frequency model. Psychological Review, 72(6), 407–418. [DOI] [PubMed] [Google Scholar]
- Parducci A (2012). Contextual effects: A Range-Frequency Analysis. In Carterette E, Psychophysical Judgment and Measurement (pp. 127–140). Elsevier. [Google Scholar]
- Parducci A, Calfee RC, Marshall LM, & Davidson LP (1960). Context effects in judgment: Adaptation level as a function of the mean, midpoint, and median of the stimuli. Journal of Experimental Psychology, 60(2), 65. [DOI] [PubMed] [Google Scholar]
- Peters HFM, Hulstijin W, & Starkweather CW (1989). Acoustic and Physiological Reaction Times of Stutterers and Nonstutterers. Journal of Speech, Language, and Hearing Research, 32(3). [DOI] [PubMed] [Google Scholar]
- Prosek RA, & Runyan CM (1983). Effects of segment and pause manipulations on the identification of treated stutterers. Journal of Speech, Language, and Hearing Research, 26(4), 510–516. [DOI] [PubMed] [Google Scholar]
- R core team. (2014). R: A language and environment for statistical computing. http://www.R-project.org/
- Robb M, & Blomgren M (1997). Analysis of F2 Transitions in the speech of stutterers and nonstutterers. Journal of Fluency Disorders, 22(1), 1–16. [Google Scholar]
- Rosen SM, & Howell P (1981). Plucks and bows are not categorically perceived. Perception & Psychophysics, 30(2), 156–168. [DOI] [PubMed] [Google Scholar]
- Ruder KF, & Jensen PJ (1972). Fluent and hesitation pauses as a function of syntactic complexity. Journal of Speech and Hearing Research, 15(1), 49–60. [DOI] [PubMed] [Google Scholar]
- Segalowitz N (2010). Cognitive bases of second language fluency. Routledge. [Google Scholar]
- Smith A, Sadagopan N, Walsh B, & Weber-Fox C (2010). Increasing phonological complexity reveals heightened instability in inter-articulatory coordination in adults who stutter. Journal of Fluency Disorders, 35(1), 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Starkweather CW (1987). Fluency and stuttering. Prentice-Hall, Inc. [Google Scholar]
- Trofimovich P, & Baker W (2006). Learning second language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition, 28(1), 1–30. [Google Scholar]
- Vasic N, & Wijnen FNK (2005). Stuttering as a monitoring deficit. In Hartsuiker RJ, & Bastiaanse R, Postma A, & Wijnen F, Phonological encoding and monitoring in normal and pathological speech (pp. 226–247). Hove: Psychology Press. [Google Scholar]
- Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag. https://ggplot2.tidyverse.org [Google Scholar]
- Yairi E, Ambrose N, & Cox N (1996). Genetics of stuttering: A critical review. Journal of Speech and Hearing Research, 39, 771–784. [DOI] [PubMed] [Google Scholar]
- Young MA (1964). Identification of Stutterers from Recorded Samples of Their “Fluent” Speech. Journal of Speech and Hearing Research, 7(3), 302–303. [DOI] [PubMed] [Google Scholar]
- Young MA (1984). Identification of stuttering and stutterers. Nature and Treatment of Stuttering: New Directions, 13–30. [Google Scholar]
- Zimmermann GN, & Hanley JM (1983). A Cinefluorographic Investigation of Repeated Fluent Productions of Stutterers in an Adaptation Procedure. Journal of Speech, Language, and Hearing Research, 26(1), 35–42. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
