Abstract
Purpose:
An extra moment after a sentence is spoken may be important for listeners with hearing loss to mentally repair misperceptions during listening. The current audiologic test battery cannot distinguish between a listener who repaired a misperception versus a listener who heard the speech accurately with no need for repair. This study aims to develop a behavioral method to identify individuals who are at risk for relying on a quiet moment after a sentence.
Method:
Forty-three individuals with hearing loss (32 cochlear implant users, 11 hearing aid users) heard sentences that were followed by either 2 s of silence or 2 s of babble noise. Both high- and low-context sentences were used in the task.
Results:
Some individuals showed notable benefit in accuracy scores (particularly for high-context sentences) when given an extra moment of silent time following the sentence. This benefit was highly variable across individuals and sometimes absent altogether. However, the group-level patterns of results were mainly explained by the use of context and successful perception of the words preceding sentence-final words.
Conclusions:
These results suggest that some but not all individuals improve their speech recognition score by relying on a quiet moment after a sentence, and that this fragility of speech recognition cannot be assessed using one isolated utterance at a time. Reliance on a quiet moment to repair perceptions would potentially impede the perception of an upcoming utterance, making continuous communication in real-world scenarios difficult especially for individuals with hearing loss. The methods used in this study—along with some simple modifications if necessary—could potentially identify patients with hearing loss who retroactively repair mistakes by using clinically feasible methods that can ultimately lead to better patient-centered hearing health care.
Supplemental Material:
In standard clinical and experimental evaluations of speech intelligibility, a correct response does not indicate whether the utterance was heard correctly or if the listener used cognitive processes to correctly reassemble an incomplete perception. The listener's response represents the end product of the combination of the peripheral auditory system and linguistic processing, and it is difficult to differentiate the contributions of both. This means that a potentially effortful cognitive process is invisible during standard testing because listeners can use the time after each stimulus to repair a misperception. Auditory and auditory–linguistic abilities both deserve to be evaluated separately, since linguistic compensation could cover up auditory difficulties and vice versa. Additionally, those two distinctly different deficits might be handled differently in clinical intervention. Addressing this issue of precise diagnostics could potentially help explain the barriers to successful social communication that make listening effortful (Hughes et al., 2018, 2021). Listening effort, or the “deliberate allocation of resources to overcome obstacles in goal pursuit when carrying out a listening task,” (Pichora-Fuller et al., 2016) certainly has multiple components, but we focus here on how the demands of language processing may be an important but underappreciated source of elevated effort, especially for individuals with hearing loss.
The linguistic structure of speech can enable better perception of upcoming words (Altmann & Kamide, 1999; Pichora-Fuller et al., 1995; Tanenhaus et al., 1995; Walker et al., 2019), as well as the restoration of earlier words (Winn & Teece, 2021a). These beneficial processes are thought to be important for people who have hearing impairment in order to compensate for a degraded auditory signal (Dingemanse & Goedegebure, 2019; Kidd & Humes, 2012; Patro & Mendel, 2016). Notably, in the process of measuring how context influences effort, Winn (2016) and Winn and Moore (2018) showed that effort exerted by listeners who use cochlear implants does not recover as quickly after a sentence is heard. Their results suggested that rather than using contextual information solely to predict upcoming words, listeners with hearing loss may use the moment after the sentence to retroactively repair misperceptions or fill in gaps in a sentence that were not heard clearly. We will refer to this retroactive repair process as cognitive repair.
To date, no clinical assessment explicitly targets the patient’s reliance on an extra moment after an utterance for cognitive repair, which we hypothesize to be a barrier to everyday communication. Although cognitive repair may increase intelligibility scores for single utterances, it is potentially an effortful compensation strategy that can slow down and interfere with the processing of continuous speech. If the process of cognitive repair is a hindrance during everyday conversation, then this is an unfortunate oversight in the evaluation of speech intelligibility, especially for people with hearing loss, who are more likely to make misperceptions that need to be repaired. Without revealing the communication barriers that are invisible during basic clinical testing, our understanding of the real-world impact remains incomplete. Time-series measures of speech processing (e.g., eye tracking, pupillometry) are useful for this goal but require specific equipment, training, and time, which are not viable for fast-paced clinic environments (Winn et al., 2018). Therefore, there is a need for a clinically viable test that examines some of the effortful complexities of speech perception identified by listeners with hearing loss.
The timing of cognitive repair is an important issue because speech flows continuously as sentences follow each other in quick succession (Heldner & Edlund, 2010). If individuals with hearing loss are using a moment after a sentence to process and repair misperceptions, they may miss the next sentence in conversation, which could make communication especially difficult. Some studies on language processing in people with hearing impairment have reinforced the focus on timing. Delays in language processing, specifically for individuals with cochlear implants (CIs), have been shown in lexical decisions tasks. These delays as well as lingering uncertainty about similar words (lexical competition) are prolonged for CI listeners but are more rapidly resolved for individuals with normal hearing (NH; Farris-Trimble et al., 2014; McMurray et al., 2017). Prolonged processing may be important for the use of context (see above) or to resolve lexical ambiguities (Gianakas & Winn, 2019). The pace of the speech will play an important role as well. For example, Piquado et al. (2012) showed that individuals with hearing loss are able to recall narrative passages more accurately when they are allowed to self-pace the presentation of the stories. Additionally, slower speaking rates increases the release from listening effort provided by semantic context (Winn & Teece, 2021b). Previous studies have shown that the extra moment after a sentence is susceptible to interference from ongoing sounds, which can neutralize the benefit of context among listeners with cochlear implants resulting in increased listening effort (Winn & Moore, 2018).
Interference with the moment of extra linguistic processing of an utterance is a type of informational masking, where a masking stimulus lowers performance on a speech task not because of energetic overlap with the target speech, but because of other properties that demand cognitive attention. Noise can affect higher level language processing (Sörqvist, 2014, 2015; Zekveld & Kramer, 2014) like reassembly of a partially heard sentence (Winn & Teece, 2021a). Moreover, using noise to mask spoken words presented impairs the recall of those words, even if they were correctly perceived at the time of presentation (Cousins et al., 2014). Irrelevant sounds can be equally distracting when presented simultaneously or after the stimuli during recall tasks (Banbury et al., 2001; Buchner et al., 2008). When an irrelevant noise is presented during a poststimulus retention interval, participants tend to make more errors during serial recall (see, e.g., Experiment 4 from Röer et al., 2014) and fail to show as much release from effort from the sentence context (Winn & Moore, 2018).
A type of informational masking that is more relevant to the focus of this study is backward recognition masking, which refers to how the presence of a masker sound presented after a target sound hinders the recognition or identification of the target. For example, the recognition of a sine tone can be hindered by subsequently presenting an interfering sine tone (Divenyi & Hirsh, 1975; Massaro, 1973). Durlach et al. (2003) and Srinivasan et al. (2019) showed that interference due to informational masking was greatest when the masker and target were similar (such as helicopter–car compared with helicopter–faucet). An important distinction here is that backward recognition masking is not backward (detection) masking (Bland & Perrott, 1978) as described by Elliott (1962), which refers to the elevation of a detection threshold due to peripheral interactions of two temporal separate sounds. Taken together, the studies described above show that the presentation of an irrelevant sound presented after the target stimulus reduces task performance, suggesting that when perceiving sentences, such postspeech disruption would impair the cognitive processes that listeners might use to repair a misperception.
The ideas described above—relating to the process of invoking short-term memory and manipulation of an auditory perception—are central to various theories of processing and effort. For example, Rönnberg et al. (2019) explicitly used the term postdiction to refer to the perceptual repair/reassembly process that would likely be more common in people with hearing impairment. Calling and retrieving information from semantic long-term memory is thought to underlie some of the effort of listening to speech, according to the Ease of Language Understanding model (Rönnberg et al., 2008, 2019). Additionally, invoking the phonological loop of verbal working memory would be a necessary part of this process (Baddeley, 2000), since the contextual information needed for repair of one word likely depends on extracting information from its neighboring words. Notably, these processes interact with the presence of noise, which can hinder the storage of auditory verbal information and, therefore, inhibit retrieval of that information as it is needed to support repair of incomplete perception.
To identify individuals who are reliant on a quiet moment after the sentence for cognitive repair, the current study will compare speech intelligibility performance when the moment after the sentence is either undisturbed (silent) or disrupted (filled with noise). The difference in performance across those two conditions is interpreted as the impact of the moment after the sentence, which itself is never masked by noise. We used the high- versus low-context sentences to explore whether the moment facilitates the use of linguistic context specifically, or simply raises performance more generally. We examine interference of postsentence processing in individuals with hearing loss, including some that use hearing aids (HA) or CIs. While these groups experience different types of auditory degradation, they are likely to share similar communication difficulties and a need for cognitive repair during speech perception. We hypothesize that listeners will have better performance for sentences that are followed by 2 s of quiet compared with 2 s of noise, and that the benefit will primarily emerge for high-context sentences, consistent with the idea that postsentence processing reflects the use of semantic information to reassemble an incomplete perception (using cues that would not be available in low-context sentences).
Method
Participants
Participants included 43 adults with HA (n = 11) and CIs (n = 32) between the ages of 23 and 82 years (average age: 63.8 years). See Table 1 for demographic information. All participants were native speakers of North American English and all CI listeners except one acquired deafness after acquiring spoken language. Of the CI listeners, 17 were tested at the University of Minnesota and 11 were tested at Stanford University. All the participants in the Hearing Aid group were tested at the University of Minnesota. As shown in Table 1, there is a difference in duration of experience with CI between participants from Stanford (mean experience = 3.5 years) and Minnesota (mean experience = 9.57 years). The mean device experience for listeners with hearing aids was 9.7 years. Stanford Ear Institute is a surgical center that facilitated the recruitment of more recently implanted participants. Experimental protocols were approved by the institutional review board of the University of Minnesota and Stanford University. All participants provided written consent prior to their participation.
Table 1.
Participant demographics.
| Participant | Sex | Age | Ear(s) | Device experience (y) | Company | Device | Etiology |
|---|---|---|---|---|---|---|---|
| Cochlear implant users | |||||||
| C114 | M | 77 | Bilateral | 13 | AB | Naida Q90 | Progressive, unknown |
| C115 | F | 82 | Bilateral | 23 | Cochlear | N7 | Otosclerosis |
| C119 | NB | 24 | Bilateral | 19 | Cochlear | N7 | Congenital, unknown |
| C126 | F | 74 | Bilateral | 7 | MED-EL | Sonnet | Unknown |
| C130 | M | 67 | Right | 2.5 | MED-EL | Sonnet | Family history |
| C131 | F | 71 | Right | 7 | Cochlear | N6 | Chronic middle ear issues |
| C134 | F | 64 | Bilateral | 8 | Cochlear | NR | Unknown |
| C136 | M | 83 | Left | 5 | AB | Naida | Sudden SNHL |
| C138 | F | 62 | Bilateral | 30 | AB | Naida | Unknown |
| C139 | F | 63 | Bilateral | 9 | AB | Naida Q70 | Family history |
| C141 | F | 75 | Right | 9 | AB | NR | Genetic |
| C143 | F | 65 | Bilateral | 4 | Cochlear | NR | Sudden SNHL |
| C144 | F | 64 | Bilateral | 17 | Cochlear | NR | Measles |
| C145 | M | 55 | Bilateral | 8 | Cochlear | NR | Meniere's disease |
| C146 | F | 58 | Bilateral | 9 | Cochlear | N7 | Childhood virus |
| C148 | M | 72 | Left | 3.5 | Cochlear | N7 | Otosclerosis |
| C149 | M | 36 | Right | 1.5 | Cochlear | NR | Meniere's disease |
| C150 | M | 72 | Left | 1 | Cochlear | Kanso 2 | Sudden SNHL |
| C154 | M | 33 | Left | 10 | Cochlear | N7 | Ototoxicity |
| C155 | F | 59 | Right | 5 | Cochlear | Kanso | Progressive, childhood infection |
| S101 | F | 23 | Left | 8.16 | Cochlear | N7 | Genetic |
| S102 | M | 71 | Left | 8.42 | Cochlear | N7 | Progressive SNHL |
| S103 | F | 71 | Right | 0.2 | MED-EL | Sonnet | Progressive SNHL |
| S104 | M | 37 | Right | 0.4 | Cochlear | Kanso | MIDD |
| S105 | F | 51 | Right | 1.58 | MED-EL | Sonnet | Genetic |
| S106 | M | 81 | Right | 0.88 | Cochlear | N7 | Progressive SNHL |
| S107 | F | 62 | Right | 0.3 | AB | Naida | Progressive |
| S108 | M | 82 | Right | 0.38 | MED-EL | Sonnet | Unknown |
| S109 | F | 51 | Right | 11 | MED-EL | Opus2 | Genetic |
| S110 | M | 67 | Left | 5.25 | MED-EL | Rondo2 | Meniere’s disease |
| S111 | F | 53 | Bilateral | 1.92 | Cochlear | N5 | Meningitis |
| Hearing aid users | |||||||
| H101 | M | 57 | Bilateral | 5 | Unitron | Moxi R | Noise-induced |
| H102 | M | 80 | Bilateral | 21 | Starkey | Livio AI | Unknown |
| H104 | F | 73 | Bilateral | 3 | Costco | NR | Unknown |
| H105 | F | 76 | Bilateral | 13 | Phonak | Audeo V70 | Progressive, unknown |
| H106 | F | 80 | Bilateral | 14 | Oticon | NR | Noise-induced |
| H107 | F | 70 | Bilateral | 7 | Oticon | NR | Unknown |
| H108 | M | 80 | Bilateral | 7 | Beltone | NR | Noise-induced |
| H111 | M | 70 | Bilateral | 9 | Microtech | Curve m7 | Noise-induced |
| H112 | M | 70 | Bilateral | 2 | Vista | T Rd 13 | Noise-induced |
| H115 | F | 77 | Bilateral | 1 | Costco | Preza | Chronic middle ear issues |
| H116 | F | 83 | Bilateral | 31 | Resound | AZ71 | Meniere's disease |
| H117 | M | 26 | Bilateral | 4 | ReSound | One 961 | Noise-induced |
Note. Participant codes beginning with C (Minnesota) or S (Stanford) refer to individuals with cochlear implants, and codes beginning with H refer to individuals with hearing aids. NR indicates the participant did not report their specific device. MIDD refers to maternally inherited diabetes and deafness. SNHL refers to sensorineural hearing loss. M = male; F = female; NB = nonbinary.
Stimuli
Stimuli were a subset of the original recordings of sentences in the Revised Speech Perception in Noise Test (R-SPIN; Bilger et al., 1984), which were spoken by an adult male native speaker of American English. The R-SPIN sentences presented were either low context or high context. The purpose of having context as a factor in the stimuli was to explore whether contextual information can be used after the sentence is heard to retroactively to fix misperceptions. The same high and low context sentences used by Winn and Moore (2018) were used in this study. These sentences were selected based on the authors' judgment of the best examples of high-context sentences while avoiding antiquated language or topics. For example, a high-context sentence in this experiment would be “She lit a candle with a match,” whereas a low-context sentence would be “She thought about the match.” Additionally, the presentation of the same target word used in both low-context and high-context sentences within a single list only occurred twice in the entire stimulus set. The lists were counterbalanced across participants and conditions, minimizing potential condition-related effects on performance. The low-context sentences were randomly selected from the entire R-SPIN corpus. There was a total of 150 sentences divided into six lists of 25 sentences. Four of the six lists were randomly chosen for each participant, who heard 100 total sentences. Each list contained 12 or 13 low- and high-context sentences, which were randomly distributed throughout the list.
Conditions
There were two conditions in which the 2-s interval period (moment) after each sentence contained silence (S-Q) or babble noise (S-N). The babble noise was set to the same intensity as the sentence as all stimuli were played at 65 dBA through a single speaker. The babble noise consisted of eight adult talkers, including both women and men, with no distinctly intelligible words in the stream. For each condition, there were two lists of 25 sentences. An example of the trial timeline is displayed in Figure 1.
Figure 1.
Timeline of a single trial, depicting the presentation of the sentence followed by the interval period containing silence (undisrupted moment) or noise (disrupted moment).
Procedure
Participants heard one sentence at a time and then were prompted to repeat the sentence 2 s later. The 2-s interval following the sentence was either silent or filled with babble noise, and this stimulus feature was consistent within a block. That is, the listener knew in advance whether the sentence was to be followed by silence or noise for an entire list of 25 sentences. The choice of the silent or noise postsentence feature was alternated for each participant testing block. The testing conditions and protocol were equivalent for both testing sites. The experiment was run using custom MATLAB software and was carried out in a sound-attenuating booth. Sounds were presented at approximately 65 dB SPL from a free-field loudspeaker (PreSonus Eris E5). Examples of each type of sentence are available in the supplemental materials. Supplemental Material S1 is a low-context sentence followed by silence and Supplemental Material S2 is a low-context sentence followed by noise. Supplemental Material S3 is a high-context sentence followed by silence and Supplemental Material S4 is a high-context sentence followed by noise.
Analysis
Intelligibility Scoring
Speech intelligibility was measured by the participant's verbal responses in real time by an experimenter. If any portion of a word was incorrect, the word was scored as incorrect. Incorrect word responses were categorized as errors in the target word (last word of the sentence) or in the leadup (any word prior to the target word), following the approach used previously by our lab (Winn & Teece, 2021a, 2021b). This approach provides an added value of recognizing that words in a sentence are not independent (Pedersen & Juhl, 2017) and, therefore, should not be treated independently as what one would get in a tally of basic word-level intelligibility. On the surface this approach appears to have less granularity because of fewer individually tracked words. However, the leadup words are treated as a kind of singular unit that plays a different role than the final word. This approach also mitigates some of the differences in structure between the high- and low-context sentences, which might have led to an unfair comparison of eligible key words across the two types.
Intelligibility was compared across conditions to determine the benefit of an extra moment of silence after the sentence. There were two ways of framing the visualization, shown in Figure 2. Panel A shows the percentage-point change between conditions, subtracting the S-N score from the S-Q score; this style was chosen because of the commonality of percentage-point expression in many clinics. Figure 2 Panel B shows the second style of visualization, which aimed to reveal the same type of information but expressed instead as the proportion of available gained intelligibility when using the score for S-N as the starting point. This number is the difference of percentage points across conditions (e.g., improvement from 60% for S-N to 75% for S-Q = improvement of 15 percentage points), divided that value by the points available to gain (i.e., headroom). For example, improvement from 60% for S-N to 75% for S-Q is 15 points out of 40 headroom points gained = 37.5% headroom gain when the extra moment was silent. The proportional expression is used here, as it is for other perceptual measures such as audiovisual benefit, to differentiate small gains that result from small benefit versus small gains that result from better initial scores (Sumby & Pollack, 1954). Despite these visualizations capitalizing on long-term averages, the statistical analyses described below operated on a trial-by-trial basis, which did not account for headroom (since headroom could only be known after full data aggregation).
Figure 2.
Example calculation of benefit of an extra moment of silence after a sentence on target word intelligibility. Panel A shows the percentage point change, and Panel B shows the obtained percentage of available improvement (headroom).
Results
Target (sentence-final) word accuracy was estimated using a generalized linear (binomial) mixed-effects model (generalized linear mixed-model) using the lme4 package (Bates et al., 2015) in the R software environment (R Core Team, 2022). A binomial analysis was used because target word accuracy was analyzed at the individual trial level as either correct (1) or incorrect (0). To assess the individual contribution of each individual predictor in the model, we began with an intercept-only model and used the likelihood ratio test (LRT) to compare each successive model that contained a single parameter difference from the previous model. Compared with the intercept-only model with a random intercept per listener, the addition of the term context (high or low context) provided significant additional explanatory power (χ2 = 649.6, p < .001). Next, the addition of condition (quiet or noise after the sentence) provided further significant improvement (χ2 = 11.3, p < .001). Following the approach by Winn and Teece (2021), leadup errors (the presence or absence of intelligibility errors before the final word) were included to estimate how those errors potentially impacted intelligibility for the sentence-final target word. The presence of leadup errors significantly contributed to the model (χ2 = 228.9, p < .001). The interaction of context and leadup errors provided further improvement (χ2 = 157.4, p < .001), supporting the notion that the benefit of context should be related to the accuracy of perceiving the words that create the context. The separate addition of the interactions are as follows: Context × Condition, Condition × Leadup Errors, and additional random effects of context and leadup errors per listener all failed to contribute significantly (all χ2 < 3.9, all p > .27). The models were also rank compared by the corrected version of the Akaike Information Criteria (AICc) using the AICcmodavg package (Mazerolle, 2020). The AICc results were in agreement with the LRT results, which indicated that the model with the most empirical support given the data were the following:
| (1) |
To obtain a better picture of the influence of hearing type on these results, the same model selection process as above was repeated for both CI and HA groups separately. LRT results confirmed that the same model selected for the whole group analysis best fit the given data for CI listeners. In contrast, LRT results pointed to the following simpler model for HA listeners, which did not include condition and which did not include an interaction between context and leadup errors. However, the subset of HA listeners was relatively small (n = 12) and might have simply lacked the power to detect those effects if they are present. The results for each hearing group are depicted in Figure 3, which shows the intelligibility for both the leadup portion and target words for high- and low-context sentences.
Figure 3.
Intelligibility of two different portions of the sentences (leadup: all words prior to the final target word; target: the final word of the sentence).
Post Hoc Comparisons of Individual Effects
Having established significant effects of each of the main effects stated above, post hoc testing was conducting using pairwise comparisons of parameter levels (using the Tukey method for multiple comparison) from the emmeans package (Lenth, 2022). These results are displayed in Figure 4. Here, the means of each treatment level were compared while averaging over all other treatments (i.e., marginal means), as opposed to comparing simple effects of specific per-parameter configurations as is normally done in a generalized linear model summary table.
Figure 4.
Pairwise comparisons for each term in the selected model, depicting target word performance for each level of the model term averaged over all levels of the other terms. *Denotes detection of statistical effects with p < .05.
Target word accuracy was significantly higher for high-context sentences compared with low-context sentences (z = −16.1, p < .0001) averaged over all other parameters. Despite Figure 4 giving the impression that condition had minimal effect on performance, target word accuracy was statistically higher for sentences followed by a quiet moment compared with sentences followed by noise (z = −2.7, p < .01). Target word accuracy was significantly higher for sentences with no leadup errors compared with sentences with leadup errors (z = 15.8, p < .001). Errors occurring in the leadup showed significant effects on target word accuracy for both low- (z = 3.47, p < .01) and high-context (z = 17.5, p < .001) sentences. The impact of context was even greater when there were no errors in the leadup (z = −21.3, p < .001), understandably confirming that context benefit at least partially hinges on successful perception of the context. When high-context sentences contained leadup errors, performance was rendered statistically indistinguishable from low-context sentences with no leadup errors (z = 0.15, p = .99), confirming the notion that misperceiving context nullifies the benefit of context. Taken together, these results suggests that context aids accurate speech perception and that misperceptions (errors) made earlier in a sentence can influence how context is used and ultimately how the rest of the sentence is perceived. Although some individuals showed notable improvement in performance when sentences were followed by quiet instead of noise, and that benefit was mainly for high-context sentences, that specific pattern did not emerge strongly in the statistical analysis for the whole group.
Individual-Level Performance
The clinical focus of this work is to identify individual listeners that are reliant on this extra moment to repair for mistakes made during speech perception. Although the group-level results do not show a substantial effect of the extra moment (silence after the sentence compared with noise after the sentence) on target word intelligibility, there were individual listeners who showed notable changes in performance. Figure 5 shows the percentage-point change of intelligibility score when high-context sentences were followed by silence rather than noise for each individual participant. Listeners S109 and C149 showed great benefit from a moment of silence after the sentence (lowest portion of Figure 5 Panel B). Some individuals like C114 and S108 (lightly gray shaded region) did not show a benefit even when their scores for high-context sentences followed by noise were not at ceiling. Toward the middle of Figure 5 (shaded in light gray), some individuals scored at ceiling in both conditions. It is difficult to determine if these individuals use the moment after the sentence because they did not show an opportunity to benefit. It would be beneficial to increase the difficulty of this task to determine the extent to which these individuals benefit from an undisrupted moment. At the top of Figure 5 Panel B (shaded in dark gray), some individuals did not benefit from the silent moment, but rather their intelligibility scores decreased unexpectedly. This may be due to their scores being at or so close to ceiling, meaning just a single mistake would result in a lower score. This variability across a wide range of performance highlights the importance of identifying listeners that are more reliant on cognitive repair as their challenges may not present in standard clinical testing.
Figure 5.
Individual percentage point change in intelligibility scores when high-context sentences were followed by quiet rather than noise. The raw intelligibility scores for each condition are displayed in the columns on the left. Proportional gain in intelligibility score is represented by the changing in shading of the individual data points.
Discussion
This study measures a listener's reliance on a silent undisrupted moment after an utterance as a unique and often overlooked challenge separate from the difficulties of encoding the speech itself. Noise during the extra moment can interfere with postsensory cognitive disambiguation of a stimulus (Röer et al., 2014; Winn & Moore, 2018), which is distinct from segregating the speech from simultaneous energetic masking. Although not every listener is susceptible to interference after the sentence, some listeners showed notable changes in performance based on the opportunity for 2 s of postsentence silent time to reconstruct the perception. These are the listeners whose hearing difficulty is likely to be underestimated by traditional tests that present a single utterance at a time, with opportunity to mentally repair and improve performance in the moments that follow.
Current clinical tests have been judged to be insufficiently sensitive to patient difficulties, motivating development of more challenging and sensitive tools (cf. Gifford et al., 2008, 2018; Spahr et al., 2014). Here, we highlight how the current task does not merely provide more challenge (as would be created by increasing background noise) but rather a different kind of challenge, which is disrupting the moment after a sentence that might be a critical part of a listener's success in exploiting linguistic processing. Speech intelligibility tests are sometimes designed to avoid the influence of language processing to better evaluate how a patient's auditory system contributes to speech perception. However, language processing is part of everyday conversation and avoiding this aspect of speech perception means avoiding full understanding of a patient's difficulties. A patient may appear to be a high performer but struggle in real-world conversation that take place in various listening environments. Speech recognition tests that focus on examining auditory function, while useful, are not suitable for understanding the wide variety of factors that affect speech recognition in regular everyday situations. These standard measures of speech perception typically avoid the influence of language processing to assess bottom-up processes. However, the test used in this experiment, while limited in various ways, directly recognizes the influence of the language processing required during everyday conversation and may better align with a patient's experiences or difficulties.
Although there remain important extra steps to take to ensure ecological validity of stimuli (coherent context across utterances, informational dysfluencies, opportunities for interaction, etc.), we emphasize here that the distinction between contemporaneous auditory encoding and postutterance processing is a potentially meaningful factor for people who have hearing loss. This study does not use postsentence noise because it is necessarily representative of everyday situations, but merely as a probe for a specific tendency or listening strategy. A more salient postsentence sound could lead to greater differences across individuals because of variation in the ability to suppress distraction, but at this time, we intend to emphasize the mere presence of a contribution of postsentence processing before exploring all of the details that might affect or interact with this factor.
There are numerous limitations to measuring the benefit of a quiet moment after a sentence is spoken. First, there might be a mundane limitation in measurement; if performance is already near 100% when the moment after the sentence is disrupted, there will not be a way to clearly observe the need for postsentence processing. Opposite to that, if performance is very low, then the words that would normally support repair of a missing word are themselves unavailable, and the repair cannot happen—this would be an example of a data limitation on the process of repair (invoking terminology used by Norman and Bobrow [1975]). This is a general principle that could extend to other kinds of beneficial processes such as spatial release from masking; if the noise is so loud that the speech is completely undetectable, then spatial separation of that speech from masking noise would not provide benefit. More specific to real-world generalization, the listener might not have sufficient time to execute the repair process, which Norman and Bobrow would classify separately as a resource limitation. Another notable limitation is that the noise-after-sentence condition in the current task depends essentially on suppression of the postsentence noise, which would emerge as performance improvement in our metric but which would not be a sustainable strategy in real-world conversation where ongoing attention would must continue to the next utterance rather than being completely suppressed.
The results of this study demonstrate that the reliance on extra processing time to use context can be measured behaviorally and successfully identify individuals that rely on this retroactive compensatory strategy during speech perception. This experiment represents the first steps toward a clinically viable test, but important milestones remain. The benefits of this test are that it uses stimuli that are readily available and requires no extra equipment or special training unlike other measures of effortful listening like pupillometry or eye tracking. Conversely, the time of testing would need to be shortened to be feasible during standard appointments. One path to shortening the test would be to omit the low-context sentences since they tended to not show any effect of extra-moment processing. Additionally, the clinician needs a criterion for deciding how much reliance on an extra moment is clinically meaningful. This study does not have a sample size sufficient for establishing normative data, and the literature is lacking for clear guidelines for clinically meaningful differences in sentence recognition scores. There are such guidelines for word recognition (Carney & Schlauch, 2007; Thornton & Raffin, 1978), which would not be an ideal solution for the current test for various reasons. First, Thornton and Raffin measured performance for single words, which are qualitatively different than full sentences. Second, there is no guarantee that variability in sentence recognition scores (which determines meaningful differences across the performance range) should necessarily adhere to an ideal binomial probability function, even though that is the approach taken for expedient statistical analysis. Finally, it is possible that the factors that affect a conditional effect on sentence recognition (namely, the present of postsentence noise) are simply more subtle than the factors that affect sentence recognition on a more basic level. Although we do not have a specific statistically informed cutoff, we suggest a 12% difference in scores before noise versus before quiet as a reasonable starting point as we continue to refine this measurement.
An important caveat to the potential clinical applicability of this test is that ceiling performance when a sentence is followed by noise will entirely obscure the potential benefit of having a silent moment after a sentence. A sizable number of participants had excellent performance even when sentences were followed by noise, so we do not know whether these listeners might still benefit from undisrupted postsentence processing in a way that is simply not captured by target word accuracy scores. Physiological approaches demonstrate that the ease of language processing can provide a relief from listening effort (Winn, 2016). In the absence of physiological measures of effort that are currently not feasible for clinics, two potential solutions for this conundrum can be envisioned easily. First, more challenging materials—such as AzBio sentences—can be used as the target sentences. Alternatively, one could express benefit of a quiet moment after the sentence in terms of reduction of SNR that still permits criterion performance. For example, a listener might need −2 dB SNR to achieve 70% correct when the masking noise continues past the end of the sentence, but only need −6 dB SNR to achieve 70% correct when the noise is truncated immediately after the sentence, leaving a quiet moment for reconstruction; this situation would reflect 4 dB of masking release due to the extra quiet moment, and could be tested at a performance criterion that avoids ceiling effects. However, ceiling-level performance is likely not the only determiner of individual reliance on a silent extra moment after the sentence, as variations in working memory and linguistic processing speed could feasibly aid performance. In absence of executive function measures for participants in the current study, their impacts remain to be seen.
The use of postsensory repair strategies may have implications for access to health care, and for documenting progress with assistive devices. For example, cochlear implant candidates might be disqualified from implantation because they use cognitive repair to improve their intelligibility scores outside of candidacy range. This means they would be perversely punished for exerting extra effort to recognize speech. This test could supplement current cochlear implant candidacy testing in order to identify individuals that are reliant on a mental repair strategy that is effortful (Winn & Teece, 2021a). Measuring a listener's reliance on an extra moment could also be a part of documenting their progress over time. For example, individuals who receive hearing aids might show no improvement in intelligibility scores because those scores as traditionally tested do not reveal the extent to which the listener must rely on an extra moment after the sentence. However, the meaningful outcome in such a case could be that they no longer rely as heavily on cognitive repair strategies when listening to speech. Taken together, a benefit of this test is that it would help prevent individuals with hearing loss slip through the cracks, and prevent true benefit from going unrecognized.
A unique component of this experiment is the focus on timing of speech. Everyday conversation can be difficult to keep up with, especially for an individual with hearing loss who is attempting to retroactively repair misperceptions while more speech is heard. Individuals that rely on cognitive repair are not afforded the extra time to make these restorations, leaving them at risk to fall behind in conversation.
The time or space between utterances is not only important for the use of cognitive repair, but also plays a role in general social interaction. Roberts et al. (2006) examined how the duration of silence between speaker turns influenced how a conversation partner perceived a conversational interaction. In their study, participants heard request or statements and their corresponding response and were tasked with determining if the individual answering the request showed disagreement or unwillingness toward the utterance. Their results showed that as the silent duration following an utterance increased, listeners perceived more unwillingness to the request or disagreement toward the statement. Even across different languages (English, Japanese, and Italian), longer durations of silence after requests were negatively perceived (Roberts et al., 2011). These findings are important in the context of hearing loss because if a person needs extra time to mentally correct mistakes made during speech perception, they might mistakenly seem less willing or in disagreement with a conversation partner because it may take longer to respond. Listeners also take longer to respond when hearing acoustically degraded speech (Gatehouse & Gordon, 1990; Pals et al., 2013). The consequent impact on social interactions and trust are not known, but could be important for people with hearing difficulty—especially children who struggle with establishing social connections (Percy-Smith et al., 2008), which has been shown to not be predictable based on speech intelligibility alone (Nicholas & Geers, 2003).
It is important to note that most of the participants were tested at the University of Minnesota, where participants are typically well experienced with their devices and are regular research participants. This means they are likely enjoying greater than average success with their devices and also have the privilege of visiting the lab during normal business hours. However, Stanford Ear Institute is a surgical center meaning that these participants may have different constraints, motivations, and experiences. Many of the Stanford CI participants showed the greatest impact of an undisrupted extra moment after stimuli. This might have been due to generally more average performance, or perhaps the relatively shorter duration of device use. The CI participants from Stanford are less experienced with their devices and therefore might not have been acclimatized to their new devices long enough to have the same ability to suppress the noxious perception of noise. If so, then the corresponding heightened physiological state of arousal (rather than mere linguistic processing time) might have negatively impacted cognitive processing, consistent with arguments made by Schuerman et al. (2022).
Most of the participants in this study tended to be older, which is important to note because cognitive abilities can change with age. Older adults may rely more on top-down processes than younger adults to compensate for decreased hearing acuity (Ayasse & Wingfield, 2020; McGarrigle et al., 2021; Tun et al., 2012; Wingfield et al., 2005; Zhao et al., 2019), and aging is associated with declines in auditory functioning (Peelle & Wingfield, 2016) as well as cognitive skills like inhibiting irrelevant information (Braver & Barch, 2002; Tun et al., 2002). These types of experiences should be taken under consideration when contextualizing results on any test, but especially the current one, as it focuses on cognitive processing and the ability to inhibit noise from disrupting linguistic processing. For this study, age was not significantly associated with intelligibility for sentences before quiet or before noise, or with the amount of benefit from an undisrupted moment after the sentence (whether expressed as percentage points or proportion of headroom gain).
Conclusions
Some individuals with hearing impairment exploit a quiet moment after a sentence to improve their apparent intelligibility score. Disrupting that moment with a short burst of noise can be a step toward understanding the separate contribution of auditory encoding as distinct from postperception cognitive repair. Not all listeners showed a benefit of a silent postsentence moment, perhaps because of ceiling effects or because of the relative simplicity of the current testing materials, which are methodological details that can be easily remedied to improve this style of testing. Although using cognitive repair can be advantageous to improve scores in clinical testing, it may break down during fast-paced real-world communication because it could interfere with listening to the subsequent sentence. Therefore, it is important to identify individuals that rely on cognitive repair in order to provide them with better patient-centered care that recognizes the realistic challenges they face in everyday life.
Data Availability Statement
Data for this study are available at https://osf.io/jrq5k/.
Supplementary Material
Acknowledgments
Portions of this work were presented at the Conference on Implantable Auditory Prosthesis (Lake Tahoe, CA; 2019) and the association For Research in Otolaryngology Midwinter Meeting (San Jose, CA; 2020). Financial support for this project was provided by the National Institute on Deafness and Other Communication Disorders (NIDCD) Grant F32DC019301 (Gianakas), National Science Foundation NRT-UtB1734815 (Gianakas), NIDCD Grant R01 DC017114 (Winn), and Stanford University (Fitzgerald). Data collection was assisted by Kate Teece, Paula Rodriguez, Siuho Gong, Emily Hugo, Hannah Matthys, and Lindsay Williams. We appreciate Jan Larky, Sarah Pirko, Jaclyn Moor, and Mateel Musallam of the Stanford Ear Institute's Cochlear Implant Audiology Team for their efforts in recruiting participants. Valuable input on this project was given by Peggy Nelson and Andrew Oxenham. The University of Minnesota stands on Miní Sóta Makhóčhe, the homelands of the Dakhóta Oyáte.
Funding Statement
Portions of this work were presented at the Conference on Implantable Auditory Prosthesis (Lake Tahoe, CA; 2019) and the association For Research in Otolaryngology Midwinter Meeting (San Jose, CA; 2020). Financial support for this project was provided by the National Institute on Deafness and Other Communication Disorders (NIDCD) Grant F32DC019301 (Gianakas), National Science Foundation NRT-UtB1734815 (Gianakas), NIDCD Grant R01 DC017114 (Winn), and Stanford University (Fitzgerald). Data collection was assisted by Kate Teece, Paula Rodriguez, Siuho Gong, Emily Hugo, Hannah Matthys, and Lindsay Williams. We appreciate Jan Larky, Sarah Pirko, Jaclyn Moor, and Mateel Musallam of the Stanford Ear Institute's Cochlear Implant Audiology Team for their efforts in recruiting participants. Valuable input on this project was given by Peggy Nelson and Andrew Oxenham. The University of Minnesota stands on Miní Sóta Makhóčhe, the homelands of the Dakhóta Oyáte.
References
- Altmann, G. T. M. , & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition, 73(3), 247–264. https://doi.org/10.1016/S0010-0277(99)00059-1 [DOI] [PubMed] [Google Scholar]
- Ayasse, N. D. , & Wingfield, A. (2020). Anticipatory baseline pupil diameter is sensitive to differences in hearing thresholds. Frontiers in Psychology, 10, 2947. https://doi.org/10.3389/fpsyg.2019.02947 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baddeley, A. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417–423. https://doi.org/10.1016/S1364-6613(00)01538-2 [DOI] [PubMed] [Google Scholar]
- Banbury, S. P. , Macken, W. J. , Tremblay, S. , & Jones, D. M. (2001). Auditory distraction and short-term memory: Phenomena and practical implications. Human Factors, 43(1), 12–29. https://doi.org/10.1518/001872001775992462 [DOI] [PubMed] [Google Scholar]
- Bates, D. , Mächler, M. , Bolker, B. , & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01 [Google Scholar]
- Bilger, R. C. , Nuetzel, J. M. , Rabinowitz, W. M. , & Rzeczkowski, C. (1984). Standardization of a test of speech perception in noise. Journal of Speech, Language, and Hearing Research, 27(1), 32–48. https://doi.org/10.1121/1.2017541 [DOI] [PubMed] [Google Scholar]
- Bland, D. E. , & Perrott, D. R. (1978). Backward masking: Detection versus recognition. The Journal of the Acoustical Society of America, 63(4), 1215–1217. https://doi.org/10.1121/1.381840 [DOI] [PubMed] [Google Scholar]
- Braver, T. S. , & Barch, D. M. (2002). A theory of cognitive control, aging cognition, and neuromodulation. Neuroscience & Biobehavioral Reviews, 26(7), 809–817. https://doi.org/10.1016/s0149-7634(02)00067-2 [DOI] [PubMed] [Google Scholar]
- Buchner, A. , Bell, R. , Rothermund, K. , & Wentura, D. (2008). Sound source location modulates the irrelevant-sound effect. Memory & Cognition, 36(3), 617–628. https://doi.org/10.3758/MC.36.3.617 [DOI] [PubMed] [Google Scholar]
- Carney, E. , & Schlauch, R. S. (2007). Critical difference table for word recognition testing derived using computer simulation. Journal of Speech, Language, and Hearing Research, 50(5), 1203–1209. https://doi.org/10.1044/1092-4388(2007/084) [DOI] [PubMed] [Google Scholar]
- Cousins, K. A. Q. , Dar, H. , Wingfield, A. , & Miller, P. (2014). Acoustic masking disrupts time-dependent mechanisms of memory encoding in word-list recall. Memory & Cognition, 42(4), 622–638. https://doi.org/10.3758/s13421-013-0377-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dingemanse, J. G. , & Goedegebure, A. (2019). The important role of contextual information in speech perception in cochlear implant users and its consequences in speech tests. Trends in Hearing, 23. https://doi.org/10.1177/2331216519838672 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Divenyi, P. L. , & Hirsh, I. J. (1975). The effect of blanking on the identification of temporal order in three-tone sequences. Perception & Psychophysics, 17(3), 246–252. https://doi.org/10.3758/BF03203207 [Google Scholar]
- Durlach, N. I. , Mason, C. R. , Shinn-Cunningham, B. G. , Arbogast, T. L. , Colburn, H. S. , & Kidd, G., Jr. (2003). Informational masking: Counteracting the effects of stimulus uncertainty by decreasing target-masker similarity. The Journal of the Acoustical Society of America, 114(1), 368–379. https://doi.org/10.1121/1.1577562 [DOI] [PubMed] [Google Scholar]
- Elliott, L. L. (1962). Backward and forward masking of probe tones of different frequencies. The Journal of the Acoustical Society of America, 34(8), 1116–1117. https://doi.org/10.1121/1.1918254 [Google Scholar]
- Farris-Trimble, A. , McMurray, B. , Cigrand, N. , & Tomblin, J. B. (2014). The process of spoken word recognition in the face of signal degradation. Journal of Experimental Psychology: Human Perception and Performance, 40(1), 308–327. https://doi.org/10.1037/a0034353 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gatehouse, S. , & Gordon, J. (1990). Response times to speech stimuli as measures of benefit from amplification. British Journal of Audiology, 24(1), 63–68. https://doi.org/10.3109/03005369009077843 [DOI] [PubMed] [Google Scholar]
- Gianakas, S. P. , & Winn, M. B. (2019). Lexical bias in word recognition by cochlear implant listeners. The Journal of the Acoustical Society of America, 146(5), 3373–3383. https://doi.org/10.1121/1.5132938 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gifford, R. H. , Noble, J. H. , Camarata, S. M. , Sunderhaus, L. W. , Dwyer, R. T. , Dawant, B. M. , Dietrich, M. S. , & Labadie, R. F. (2018). The relationship between spectral modulation detection and speech recognition: Adult versus pediatric cochlear implant recipients. Trends in Hearing, 22. https://doi.org/10.1177/2331216518771176 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gifford, R. H. , Shallop, J. K. , & Peterson, A. M. (2008). Speech recognition materials and ceiling effects: Considerations for cochlear implant programs. Audiology and Neurotology, 13(3), 193–205. https://doi.org/10.1159/000113510 [DOI] [PubMed] [Google Scholar]
- Heldner, M. , & Edlund, J. (2010). Pauses, gaps and overlaps in conversations. Journal of Phonetics, 38(4), 555–568. https://doi.org/10.1016/j.wocn.2010.08.002 [Google Scholar]
- Hughes, S. E. , Aiyegbusi, O. L. , Lasserson, D. , Collis, P. , Glasby, J. , & Calvert, M. (2021). Patient-reported outcome measurement: A bridge between health and social care? Journal of the Royal Society of Medicine, 114(8), 381–388. https://doi.org/10.1177/01410768211014048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hughes, S. E. , Hutchings, H. A. , Rapport, F. L./. , McMahon, C. M. , & Boisvert, I. (2018). Social connectedness and perceived listening effort in adult cochlear implant users: A grounded theory to establish content validity for a new patient-reported outcome measure. Ear and Hearing, 39(5), 922–934. https://doi.org/10.1097/AUD.0000000000000553 [DOI] [PubMed] [Google Scholar]
- Kidd, G. R. , & Humes, L. E. (2012). Effects of age and hearing loss on the recognition of interrupted words in isolation and in sentences. The Journal of the Acoustical Society of America, 131(2), 1434–1448. https://doi.org/10.1121/1.367597 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lenth, R. V. (2022). emmeans: Estimated marginal mmeans, aka least-squares Means. R package version 1.7.5. https://CRAN.R-project.org/package=emmeans
- Massaro, D. W. (1973). A comparison of forward versus backward recognition masking. Journal of Experimental Psychology, 100(2), 434–436. https://doi.org/10.1037/h0035442 [DOI] [PubMed] [Google Scholar]
- Mazerolle, M. J. (2020). AICcmodavg: Model selection and multimodel inference based on (Q)AIC(c). R package version 2.3–1. https://cran.r-project.org/package=AICcmodavg
- McGarrigle, R. , Knight, S. , Rakusen, L. , Geller, J. , & Mattys, S. (2021). Older adults show a more sustained pattern of effortful listening than young adults. Psychology and Aging, 36(4), 504–519. https://doi.org/10.1037/pag0000587 [DOI] [PubMed] [Google Scholar]
- McMurray, B. , Farris-Trimble, A. , & Rigler, H. (2017). Waiting for lexical access: Cochlear implants or severely degraded input lead listeners to process speech less incrementally. Cognition, 169, 147–164. https://doi.org/10.1016/j.cognition.2017.08.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicholas, J. G. , & Geers, A. E. (2003). Personal, social, and family adjustment in school-aged children with a cochlear implant. Ear and Hearing, 24(1), 69S–81S. https://doi.org/10.1097/01.AUD.0000051750.31186.7A [DOI] [PubMed] [Google Scholar]
- Norman, D. A. , & Bobrow, D. G. (1975). On data-limited and resource-limited processes. Cognitive Psychology, 7(1), 44–64. https://doi.org/10.1016/0010-0285(75)90004-3 [Google Scholar]
- Pals, C. , Sarampalis, A. , & Baskent, D. (2013). Listening effort with cochlear implant simulations. Journal of Speech, Language, and Hearing Research, 56(4), 1075–1084. https://doi.org/10.1044/1092-4388(2012/12-0074) [DOI] [PubMed] [Google Scholar]
- Patro, C. , & Mendel, L. L. (2016). Role of contextual cues on the perception of spectrally reduced interrupted speech. The Journal of the Acoustical Society of America, 140(2), 1336–1345. https://doi.org/10.1121/1.4961450 [DOI] [PubMed] [Google Scholar]
- Pedersen, E. R. , & Juhl, P. M. (2017). Simulated critical differences for speech reception thresholds. Journal of Speech, Language, and Hearing Research, 60(1), 238–250. https://doi.org/10.1044/2016_JSLHR-H-15-0445 [DOI] [PubMed] [Google Scholar]
- Peelle, J. E. , & Wingfield, A. (2016). The neural consequences of age-related hearing loss. Trends in Neurosciences, 39(7), 486–497. https://doi.org/10.1016/j.tins.2016.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Percy-Smith, L. , Jensen, J. H. , Cayé-Thomasen, P. , Thomsen, J. , Gudman, M. , & Lopez, A. G. (2008). Factors that affect the social well-being of children with cochlear implants. Cochlear Implants International, 9(4), 199–214. https://doi.org/10.1002/cii.368 [DOI] [PubMed] [Google Scholar]
- Pichora-Fuller, M. K. , Schneider, B. A. , & Daneman, M. (1995). How young and old adults listen to and remember speech in noise. The Journal of the Acoustical Society of America, 97(1), 593–608. https://doi.org/10.1121/1.412282 [DOI] [PubMed] [Google Scholar]
- Pichora-Fuller, M. K. , Kramer, S. E. , Eckert, M. A. , Edwards, B. , Hornsby, B. W. , Humes, L. E. , Lemke, U. , Lunner, T. , Matthen, M. , Mackersie, C. L. , Naylor, G. , Phillips, N. A. , Richter, M. , Rudner, M. , Sommers, M. S. , Tremblay, K. L. , & Wingfield, A. (2016). Hearing impairment and cognitive energy: The Framework for Understanding Effortful Listening (FUEL). Ear and Hearing, 37(Suppl. 1), 5S–27S. https://doi.org/10.1097/AUD.0000000000000312 [DOI] [PubMed] [Google Scholar]
- Piquado, T. , Benichov, J. I. , Brownell, H. , & Wingfield, A. (2012). The hidden effect of hearing acuity on speech recall, and compensatory effects of self-paced listening. International Journal of Audiology, 51(8), 576–583. https://doi.org/10.3109/14992027.2012.684403 [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/ [Google Scholar]
- Roberts, F. , Francis, A. L. , & Morgan, M. (2006). The interaction of inter-turn silence with prosodic cues in listener perceptions of “trouble” in conversation. Speech Communication, 48(9), 1079–1093. https://doi.org/10.1016/j.specom.2006.02.001 [Google Scholar]
- Roberts, F. , Margutti, P. , & Takano, S. (2011). Judgments concerning the valence of inter-turn silence across speakers of American English, Italian, and Japanese. Discourse Processes, 48(5), 331–354. https://doi.org/10.1080/0163853X.2011.558002 [Google Scholar]
- Röer, J. P. , Bell, R. , & Buchner, A. (2014). What determines auditory distraction? On the roles of local auditory changes and expectation violations. PLOS ONE, 9(1), Article e84166. https://doi.org/10.1371/journal.pone.0084166 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rönnberg, J. , Holmer, E. , & Rudner, M. (2019). Cognitive hearing science and ease of language understanding. International Journal of Audiology, 58(5), 247–261. https://doi.org/10.1080/14992027.2018.1551631 [DOI] [PubMed] [Google Scholar]
- Rönnberg, J. , Rudner, M. , Foo, C. , & Lunner, T. (2008). Cognition counts: A working memory system for ease of language understanding (ELU). International Journal of Audiology, 47(Suppl. 2), S99–S105. https://doi.org/10.1080/14992020802301167 [DOI] [PubMed] [Google Scholar]
- Schuerman, W. , Chandrasekaran, B. , & Leonard, M. K. (2022). Arousal states as a key source of variability in speech perception and learning. Language, 7(1), 19. https://doi.org/10.3390/languages7010019 [Google Scholar]
- Sörqvist, P. (2014). On interpretation and task selection in studies on the effects of noise on cognitive performance. Frontiers in Psychology, 5, 1249. https://doi.org/10.3389/fpsyg.2014.01249 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sörqvist, P. (2015). On interpretation and task selection: The sub-component hypothesis of cognitive noise effects. Frontiers in Psychology, 5, 1598. https://doi.org/10.3389/fpsyg.2014.01598 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spahr, A. J. , Dorman, M. F. , Litvak, L. M. , Cook, S. J. , Loiselle, L. M. , Dejong, M. D. , Hedley-Williams, A. , Sunderhaus, L. S. , Hayes, C. A. , & Gifford, R. H. (2014). Development and validation of the pediatric AzBio sentence lists. Ear and Hearing, 35(4), 418–422. https://doi.org/10.1097/AUD.0000000000000031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Srinivasan, N. K. , Clark, K. , Gaston, J. , & Perelman, B. (2019). The effect of target and masker similarities on backward recognition masking for environmental sounds. Proceedings of Meetings on Acoustics, 36(1), 050004. https://doi.org/10.1121/2.0001082 [Google Scholar]
- Sumby, W. H. , & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26(2), 212–215. https://doi.org/10.1121/1.1907309 [Google Scholar]
- Tanenhaus, M. K. , Spivey-Knowlton, M. J. , Eberhard, K. M. , & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268(5217), 1632–1634. https://doi.org/10.1126/science.7777863 [DOI] [PubMed] [Google Scholar]
- Thornton, A. R. , & Raffin, M. J. (1978). Speech-discrimination scores modeled as a binomial variable. Journal of Speech and Hearing Research, 21(3), 507–518. https://doi.org/10.1044/jshr.2103.507 [DOI] [PubMed] [Google Scholar]
- Tun, P. A. , O'Kane, G. , & Wingfield, A. (2002). Distraction by competing speech in young and older adult listeners. Psychology and Aging, 17(3), 453–467. https://doi.org/10.1037/0882-7974.17.3.453 [DOI] [PubMed] [Google Scholar]
- Tun, P. A. , Williams, V. A. , Small, B. J. , & Hafter, E. R. (2012). The effects of aging on auditory processing and cognition. American Journal of Audiology, 21(2), 344–350. https://doi.org/10.1044/1059-0889(2012/12-0030) [DOI] [PubMed] [Google Scholar]
- Walker, E. A. , Kessler, D. , Klein, K. , Spratford, M. , Oleson, J. J. , Welhaven, A. , & McCreery, R. W. (2019). Time-gated word recognition in children: Effects of auditory access, age, and semantic context. Journal of Speech, Language, and Hearing Research, 62(7), 2519–2534. https://doi.org/10.1044/2019_JSLHR-H-18-0407 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wingfield, A. , Tun, P. A. , & McCoy, S. L. (2005). Hearing loss in older adulthood. Current Directions in Psychological Science, 14(3), 144–148. https://doi.org/10.1111/j.0963-7214.2005.00356.x [Google Scholar]
- Winn, M. B. (2016). Rapid release from listening effort resulting from semantic context, and effects of spectral degradation and cochlear implants. Trends in Hearing, 20. https://doi.org/10.1177/2331216516669723 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winn, M. B. , & Moore, A. N. (2018). Pupillometry reveals that context benefit in speech perception can be disrupted by later-occurring sounds, especially in listeners with cochlear implants. Trends in Hearing, 22. https://doi.org/10.1177/2331216518808962 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winn, M. B. , & Teece, K. H. (2021a). Listening effort is not the same as speech intelligibility score. Trends in Hearing, 25. https://doi.org/10.1177/23312165211027688 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winn, M. B. , & Teece, K. H. (2021b). Slower speaking rate reduces listening effort among listeners with cochlear implants. Ear and hearing, 42(3), 584–595. https://doi.org/10.1097/AUD.0000000000000958 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winn, M. B. , Wendt, D. , Koelewijn, T. , & Kuchinsky, S. (2018). Best practices and advice for using pupillometry to measure listening effort: An introduction for those who want to get started. Trends in Hearing, 22. https://doi.org/10.1177/2331216518800869 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zekveld, A. A. , & Kramer, S. E. (2014). Cognitive processing load across a wide range of listening conditions: Insights from pupillometry. Psychophysiology, 51(3), 277–284. https://doi.org/10.1111/psyp.12151 [DOI] [PubMed] [Google Scholar]
- Zhao, S. , Bury, G. , Milne, A. , & Chait, M. (2019). Pupillometry as an objective measure of sustained attention in young and older listeners. Trends in Hearing, 23. https://doi.org/10.1177/2331216519887815 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data for this study are available at https://osf.io/jrq5k/.





