Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 18.
Published in final edited form as: Int J Audiol. 2022 Dec 15;63(3):199–206. doi: 10.1080/14992027.2022.2150901

Input-related demands: vocoded sentences evoke different pupillometrics and subjective listening effort than sentences in speech-shaped noise

Nicholas P Giuliani a, Soumya Venkitakrishnan b, Yu-Hsiang Wu a,b
PMCID: PMC10947987  NIHMSID: NIHMS1868656  PMID: 36519812

Abstract

Objectives:

The Framework for Effortful Listening (FUEL) suggests five input-related demands can alter listening effort: source, transmission, listener, message and context factors. We hypothesised that vocoded sentences represented a source factor degradation and sentences in speech-shaped noise represented a transmission factor degradation. We used pupillometry and a subjective scale to examine our hypothesis.

Design:

Participants listened to vocoded sentences and sentences in speech-shaped noise at several difficulty levels designed to produce similar word recognition abilities; they also listened to unprocessed sentences. Within-participant pupillometrics and subjective listening effort were analysed. Post-hoc analyses were performed to examine if word recognition accuracy differentially influenced pupil responses.

Study samples:

Twenty young adults with normal hearing.

Results:

Baseline pupil diameter was significantly smaller, peak pupil dilation was significantly larger, peak pupil dilation latency was significantly shorter, and subjective listening effort was significantly greater for the vocoded sentences than the sentences-in-noise. Word recognition ability also affected pupillometrics, but only for the vocoded sentences.

Conclusions:

Our findings suggest that source factor degradations result in greater listening effort than transmission factor degradations. Future research should address how clinical interventions tailored towards different input-related demands may lead to reduced listening effort and improve patient outcomes.

Keywords: Listening effort, pupillometrics, pupillometry, input-related demands, vocoded

Introduction

Understanding speech with familiar talkers in quiet comes effortlessly to listeners with normal hearing. However, changes to the characteristics of the talker, the listener, or the environment makes the previously effortless task more cognitively demanding. It is well-established that more effort is required to maintain task accuracy in the face of increasing cognitive demands (Kahneman et al. 1969; Kahneman 1973). This exertion, when applied to a listening task, has been deemed “listening effort” (McGarrigle et al. 2017; Pichora-Fuller et al. 2016).

Listening effort is multidimensional, meaning that the measurement technique used to capture these cognitive burdens must be carefully selected as each is sensitive (or insensitive) to different aspects of listening effort. Subjective ratings of listening effort, which are easy to implement and frequently included alongside other measures, rarely correlate well to physiological assessments (Alhanbali et al. 2019; Francis et al. 2021). Therefore, it is important to evaluate listening effort using a combination of objective and subjective tools. While there is no standard, pupillometry has emerged as a reliable method for objectively quantifying listening effort.

Pupillometry

The pupil is innervated by both sympathetic and parasympathetic pathways (Steinhauer et al. 2004; Kramer et al. 2013; Wang et al. 2018) and changes in pupil size due to cognitive demands are governed by sympathetic projections via the locus coeruleus within the pons (Rajkowski, Kubiak, and Aston-Jones 1994; Aston-Jones and Cohen 2005; Gilzenrat et al. 2010; Liu et al. 2017). Pupillometry is the method by which these cognitive changes are measured and the most common metric is the peak pupil dilation (PPD) response. During a listening task, the pupil dilates and reaches a peak several seconds after stimulus onset (eg Winn et al. 2018). Pupillometry is sensitive to a wide variety of cognitively demanding tasks including poor performance and degradation type (Zekveld, Koelewijn, and Kramer 2018), with increasing cognitive demands evoking progressively larger PPDs. Importantly, pupillometry is sensitive to cognitive demands even when speech understanding does not change. For example, PPD is significantly larger during mental repair of highly understood sentences (Winn and Teece 2022), while listening to perfectly understood accented speech (McLaughlin and Van Engen 2020), and while participants listen to speech-in-noise with hearing aid digital noise reduction algorithms engaged, even though speech understanding is unchanged (Fiedler et al. 2021; Wendt et al. 2017). Therefore, the PPD is useful for examining cognitive demands over and above those required for speech understanding. The Framework for Understanding Effortful Listening (“FUEL”, Pichora-Fuller et al. 2016) suggests that input-related demands, which can be subdivided into transmission, source, listener, message and context factors, may evoke varying degrees of listening effort. To date, no study has sought to specifically examine the effect of input-related demands on listening effort. The remainder of this paper will focus on two input-related demands that are frequently manipulated in listening effort studies: source and transmission factors.

Source and transmission factors

Source factors alter listening effort by manipulating characteristics about the talker (ie accent or affect) whereas transmission factors alter listening effort by manipulating the transmission medium by lowering the signal-to-noise ratio (SNR; eg using noise or reverberation; Mattys et al. 2012; Pichora-Fuller et al. 2016; Peelle 2018; Rönnberg, Holmer, and Rudner 2019). A potentially crucial difference between source and transmission factors is that the former often–but not exclusively – occurs without a decline in word understanding abilities (eg McLaughlin and Van Engen 2020). There is one form of speech stimulus whose classification seems to straddle input-related demands: vocoded speech.

Listening to vocoder simulations increases listening effort (Zekveld et al. 2014; Winn, Edwards, and Litovsky 2015; McMahon et al. 2016), but there are data that suggest its input-related demands are distinct from sentences degraded by noise. Winn, Edwards, and Litovsky (2015) examined PPD while participants listened to vocoded speech as they altered the number of vocoder channels. They found that PPD increased as the number of vocoder channels decreased, despite no significant decrements to word understanding for all but the most challenging condition; given that there is no change in the transmission medium, it may imply that the listening effort required to attend vocoded speech is due to source factor manipulation. Additionally, the authors go on to compare their findings visually to other studies and indicated their results were greater than those who examined speech understanding in noise under poor word recognition abilities (50% and 29%; Winn, Edwards, and Litovsky 2015). Therefore, even though vocoder stimulations and noise degrade speech, the methods by which they increase listening effort may be unique. We propose that these differences are best explained by different input-related demands. Clarifying how input-related demands alter listening effort may lead to improved patient outcomes by tailoring audiological interventions to each demand.

Study goals

The goal of the present study was to explore how input-related demands affect listening effort. We used two forms of degraded speech that may reflect different input-related demands: vocoded sentences and sentences-in-noise. We anticipated that common pupillometrics (ie PPD, baseline pupil diameter (BPD)) and subjective ratings would be greater for vocoded sentences than sentences-in-noise when controlling for word recognition ability. We reasoned that these outcomes would reflect two different input-related demands: source and transmission factors. We controlled for the three other input-related demands by recruiting a group of young adults (listener demand) who listened to sentences (message demand) in a controlled laboratory environment (context demand). We also anticipated that PPD would increase up to a moderate level of difficulty and then decrease at higher levels of difficulty.

Materials and methods

Twenty normal hearing young adults participated in the study (5 were males and 15 were females; ages 22–37 years, mean = 27.45 years; standard deviation = 4.92 years). This number of participants was deemed appropriate based on previous studies that demonstrated significant changes to PPD as a listening task was made more challenging (Winn, Edwards, and Litovsky 2015; McMahon et al. 2016; Giuliani, Brown, and Wu 2020). Data from one subject were unusable due to the eye tracker losing its lock on their eyes, thus resulting in 19 participants included in the analyses. Normal hearing was defined as pure tone air conduction thresholds of ≤20 dB HL for octave frequencies from 250 to 4000 Hz in both ears. All participants were native English speakers and had normal or corrected to normal vision in both eyes. Exclusion criteria included hearing loss or a reported history of psychiatric illness. All participants were consented in accordance with the procedures submitted to the Institutional Review Board at the University of Iowa and participants were compensated for their time.

Stimuli

IEEE sentences were used for speech perception testing and consist of phonetically balanced English sentences spoken by a single male talker (Institute of Electrical and Electronics Engineers 1969). The sentences (2–4 s long) were offset aligned and the beginnings were zero-padded in MATLAB (Natick, MA) so that each stimulus lasted for exactly 5 s (Winn, Edwards, and Litovsky 2015; Giuliani, Brown, and Wu 2020). The long-term average speech spectrum of 20 different sentences taken randomly from the IEEE recordings was used to generate a custom noise stimulus. The SNRs were established by evaluating the root-mean-square amplitude of each sentence and the speech-shaped noise; further verification of these SNRs was performed in a sound booth using a sound level metre and A-weighted sound pressure level (dBA). Noise-vocoded sentences were generated using AngelSim software. A pilot study using the IEEE sentences was conducted to select pairs of vocoded sentences and sentences-in-noise that were matched based on average intelligibility. Sentences were presented via insert earphones and the number of vocoder channels and SNRs were altered to manipulate sentence intelligibility. The sentences were scored as a percentage of the total words repeated correctly. Based on the pilot data, five vocoder simulations (8, 6, 5, 4 and 3 channels) and five SNRs (−3, −5, −7, −9 and −11 dB SNR) were selected because they resulted in similar word recognition abilities. Unprocessed sentences were included in the study to serve as a control condition because they require little effort.

Procedures

Participants were consented and their hearing was screened to verify that they met the inclusion criteria. After consent was obtained and the screening procedures were performed, the participant was seated at a table in the sound booth and insert earphones were placed. A chin rest was fastened to the table at a fixed distance of 60 cm from a computer monitor (Dell® Round Rock, TX, USA; 16:9 aspect ratio). A Tobii X2-60 distal eye tracker (Danderyd, Sweden) with a sampling rate of 60 Hz was secured to the bottom of a computer monitor. Participants were instructed to set their chin in the chin rest and gaze at the grey fixation cross displayed on the computer monitor. The height of the chin rest and monitor were adjusted so that the participant’s eyes were in the centre of the eye tracker’s recording field. Ambient light levels were adjusted and a level that resulted in a pupil diameter in the middle of their dynamic range was used for testing (eg Zekveld, Kramer, and Festen 2010; Winn, Edwards, and Litovsky 2015; Winn et al. 2018; McMahon et al. 2016). A calibration procedure was initiated and if it was unsuccessful, the participant was repositioned and the procedure was performed again. Following a successful calibration, a practice list of one block of 20 sentences was presented at a randomly selected SNR to familiarise the participant with the timing of the task; this was also done with one randomly selected block of 20 vocoded sentences.

The presentation level for the vocoded sentences and the unprocessed sentences was fixed at 65 dBA. The presentation level for the sentences-in-noise was varied to achieve the different SNRs; this was done so that the participants could not judge the difficulty based on the perceived loudness of the noise (Mackersie and Calderon-Moultrie 2016; Strand et al. 2018). The noise always began 2 s before the start of the trial to allow any reaction to the noise itself to subside and to provide the participants with uniform timing across sentences. After each sentence, there was a 2-s retention period of silence followed by the participant’s response, which was always cued using a brief 1000 Hz pure tone. Supplementary Figures 1 and 2 provide a schematic representation of a single trial for vocoded speech and sentences-in-noise, respectively. Sentences were presented in blocks of 20 at a randomly selected difficulty, resulting in a total of 220 sentences (not including the practice sentences) per participant.

Participants were instructed to listen to each sentence and repeat back what they heard after they heard the response cue. Each sentence was scored based on the total number of words repeated correctly and no feedback was provided. The trial was advanced to the next sentence manually after the examiner scored the response, which occurred an average of 6 s (range 3.2–7 s) after the participant’s response. After the completion of each block of 20 sentences, participants were asked: “On a scale from 0 to 100, how much effort was required for you to understand the sentences?” A value of “0” indicated no effort while a value of “100” indicated maximum effort; no numerical restraints other than these floor and ceiling values were placed on the participants (as in Wu et al. 2016; Giuliani, Brown, and Wu 2020).

The presentation of the stimuli was controlled by a custom script in MATLAB® . The pupillometry responses were recorded in Attention Tool® software (Boston, MA, USA; version 6.2, iMotions Global). Participants were given breaks lasting several minutes after each block of sentences to reduce fatigue; the entire protocol lasted less than two hours, which also helps to limit fatigue (Winn et al. 2018).

Data processing and analysis

Data processing and analysis were conducted on the right eye and in a manner similar to those suggested in previous studies (eg Zekveld, Kramer, and Festen 2010; Winn et al. 2018; Giuliani, Brown, and Wu 2020). First, the data were preprocessed by removing eye blinks and interpolating between the missing values. Then, the responses for each trial were smoothed using a 10-point moving average filter. Data were examined visually and any trial that contained excessive eye blinks (>15% of data per Winn et al. 2018) were removed.

After preprocessing, BPDs were calculated for each trial. The baseline values for the unprocessed sentences and the vocoded sentences were calculated during a 1-s interval before the start of each sentence (Supplemental Figure 1). The sentences in background noise contained two extra seconds of noise before the stimuli were presented. Therefore, two baseline values were calculated for the sentences in background noise (Supplemental Figure 2). The first was calculated as the average pupil diameter during a 1-ssilent period before the start of the noise (herein referred to as baseline pupil diameter, silent: BPDS); the second was calculated as the average pupil diameter during a 1-s period before the start of each sentence, while the participant listened to the noise (herein referred to as baseline pupil diameter, noise: BPDN). The baseline values were then subtracted from each trial, resulting in change in pupil dilation relative to the baseline for each trial. Next, the maximum PPD was derived for each baseline-corrected response during a time window beginning 0.5 seconds before the end of the sentences and ending at the response prompt (Winn, Edwards, and Litovsky 2015; Winn et al. 2018; Giuliani, Brown, and Wu 2020). The PPD calculated using the two different baselines are herein referred to as PPDS and PPDN, respectively. An automated script in MATLAB® was used to evaluate and automatically remove any responses that exhibited a downward drift from the baseline period or any peak responses that were greater than three standard deviations from the baseline mean were removed. (per Winn et al. 2018). In all, this resulted in a rejection of approximately 4% of the individual responses for the unprocessed sentences, 29% of the individual responses for the vocoded sentences, 40% of the individual responses for the sentences-in-noise during the first baseline period, and 25% of the individual responses for the sentences-in-noise during the second baseline period. Finally, the processed data were averaged together to create mean baseline and mean peak responses for each participant. Post-hoc analyses were performed by binning pupil responses from each trial based on the percentage of words repeated correctly for each sentence; these bins were selected to represent a range of accuracies (ie 100%, 99–72%, 71–51% and <51%; Wu et al. 2016; Giuliani, Brown, and Wu 2020). This method allows for comparisons between a clinically viable methodology (ie fixed SNRs) and methods that equate performance across individuals (ie individualised SNRs) as well as to separate out the effects of infrequent poor performance on listening effort. For the vocoded sentences, 23%, 21%, 13% and 43% of the pupil responses retained after processing were placed into each bin, respectively. For the sentences-in-noise, 25%, 22%, 13% and 39%, of the pupil responses retained after processing were placed into each bin, respectively. The same analysis methods mentioned previously were then applied to each bin.

Linear mixed models were used in SPSS® to assess within subject differences of listening effort across the input-related demands and difficulty levels. The outcome variables for each difficulty were: mean percent words correct, mean subjective listening effort rating, mean BPD, mean PPD and mean latency of PPD. An unstructured covariance was employed and “participant” was modelled as the random effect. Non-significant main effects were gradually removed until the model with the lowest Akaike’s Information Criteria was obtained. The word recognition and subjective ratings were not normally distributed and therefore, a rationalised arcsine transformation was applied to the data for analysis (Studebaker 1985). If a significant main effect was determined, post-hoc Bonferroni corrected t-tests were performed to test for significant pairwise differences among group means. Where appropriate, Cohen’s d values were calculated to examine the relative effect size of any significant findings.

Results

Speech recognition and subjective effort ratings

Figure 1 (Supplemental Table 1) shows speech recognition scores by difficulty for vocoded sentences, sentences-in-noise and unprocessed sentences. The data are presented in percent correct but were analysed in rationalised arcsine units. The final model revealed a significant interaction between degradation type and difficulty level (F5,220=3.37, p = 0.006). Post-hoc Bonferroni corrected t-tests revealed a significant difference in performance between the six-channel vocoder and −5 dB SNR conditions (p = 0.001). Participants scored 10.5% poorer on the −5 dB SNR condition than the six-channel vocoder condition (Cohen’s d = 0.65). The remaining post-hoc effects were not significant (p > 0.05). There was a significant main effect of difficulty (F5,220=595.59, p < 0.001) and post-hoc Bonferroni corrected t-tests revealed that each successively more challenging condition resulted in significantly poorer speech recognition scores (p < 0.001 for all pairwise contrasts).

Figure 1.

Figure 1.

Word recognition accuracy by paired difficulty level. The unprocessed sentences are in light green, sentences-in-noise are in grey, and the vocoded sentences are in the salmon colour. With the exception of the six-channel vocoder and −5 dB SNR conditions, there were no significant differences in performance between the pairings. Statistical measures were performed on the RAU transformed data but displayed here as a percentage ranging from 0 to 100. The “**” indicates p values <0.001.

Figure 2 (Supplemental Table 1) shows the subjective effort ratings by difficulty for vocoded sentences, sentences-in-noise and unprocessed sentences. The data are presented as a percentage but were analysed in rationalised arcsine units. The final model showed no significant interactions (p > 0.05) but demonstrated a significant main effect of difficulty level (F5,220=231.932, p < 0.001). Post-hoc Bonferroni corrected t-tests revealed that perceived listening effort increased with increasing difficulty (p < 0.001 for all pairwise combinations). There was also a significant main effect of degradation type (F1,220=250.363, p < 0.001) and post-hoc Bonferroni corrected t-tests indicated that the vocoded sentences required greater subjective effort than the sentences-in-noise (p < 0.001). On average, all vocoded sentences were rated as 12.3% (range: 9.7–15.4%; Cohen’s d = 1.10) more effortful than sentences-in-noise, despite statistically similar word recognition accuracy.

Figure 2.

Figure 2.

Subjective listening effort (%) by paired difficulty level. The unprocessed sentences are in light green, sentences-in-noise are in grey, and the vocoded sentences are in the salmon colour. The vocoded sentences were rated as significantly more effortful than sentences-in-noise. Statistical measures were performed on the RAU transformed data but displayed here as a percentage ranging from 0 to 100. The “**” indicates p values <0.001.

Baseline pupil diameter

Supplemental Table 1 also displays the mean BPD for the vocoded, noise and unprocessed sentences. The values for the sentences-in-noise are divided into two sub-columns: the left sub-column indicates the BPDS and the right sub-column indicates the BPDN. The final model showed no significant interactions (p > 0.05) but demonstrated a significant main effect of baseline period (F3,265.033=5.160, p = 0.002) and post-hoc Bonferroni corrected t-tests revealed that mean BPD for the vocoded sentences was significantly smaller than the BPDS, (p = 0.044; Cohen’s d = 0.66) and BPDN (p = 0.001; Cohen’s d = 0.99. There were no significant differences between BPDS and BPDN (p > 0.05). There were no significant differences of BPD between the unprocessed sentences and vocoded sentences nor between the unprocessed sentences and the sentences-in-noise (p > 0.05). The same model indicated no significant differences in BPD due to manipulation of task difficulty (p > 0.05 for all combinations).

Peak pupil diameter

Supplemental Table 1 also displays PPD for the vocoded, noise and unprocessed sentences. The final model showed no significant interactions (p > 0.05) but demonstrated a significant main effect of degradation type on PPD (F3,272.088=41.943, p < 0.001) and post-hoc Bonferroni corrected t-tests revealed that PPD was significantly larger for the vocoded sentences than the PPDN, (p < 0.001; Cohen’s d = 1.43) and PPDS (p = 0.016; Cohen’s d = 0.43); the PPDS was significantly greater than the PPDN (p < 0.001; Cohen’s d = 1.00); the PPD for the unprocessed sentences was significantly smaller than the vocoded sentences, PPDN and PPDS (p < 0.001 for all combinations; Cohen’s d = 2.67, 1.33 and 2.27, respectively). While the PPDS values were significantly larger than the PPDN values, both were significantly smaller than the PPD values for the vocoded sentences. Therefore, the PPD values for the sentences-in-noise will be presented henceforth as the mean of PPDS and PPDN. The same model indicated no significant differences in PPD due to manipulation of task difficulty (p > 0.05 for all combinations).

Figure 3 displays the grand mean change in peak pupil dilation by time, aligned to the sentence offset. The dashed vertical lines represent the analysis window, which always occurred 0.5 s before the end of each sentence and ended at the response prompt 2 s after stimulus offset (ie during the retention period). The general pattern was that PPD increased rapidly from baseline, peaked during the retention period and then decreased before the response cue. PPD was significantly larger for the vocoded sentences (salmon-coloured) than the noise (grey) and unprocessed (green) conditions.

Figure 3.

Figure 3.

Change in PPD from baseline by time. The dashed vertical lines represent the analysis window. The green line represents the mean PPD for the unprocessed sentences, the grey line represents the mean PPD for the sentences-in-noise, and the salmon line represents the mean PPD for the vocoded sentences. Mean PPD for the vocoded sentences was significantly greater than mean PPD for the unprocessed sentences and sentences-in-noise. Mean PPD for the sentences-in-noise was significantly greater than mean PPD for the unprocessed sentences. While PPD values for the two baseline periods for the sentences-in-noise were significantly different, they are presented here as the average because both were significantly smaller than the vocoded sentences.

Peak pupil diameter latency

Figure 4 (Supplemental Table 1) displays the latencies of the PPD for the vocoded sentences, sentences-in-noise and unprocessed sentences. The final model showed no significant interactions (p > 0.05) but demonstrated a significant effect of degradation type (F2,149.54 =27.46, p < 0.001) and post-hoc Bonferroni corrected t-tests revealed that the PPD latency for the sentences-in-noise was 0.63 s longer than the PPD latency for the vocoded sentences (p < 0.001; Cohen’s d = 4.21) but not significantly longer than the PPD for the unprocessed sentences (p > 0.05); however, the PPD latency for the unprocessed sentences was 0.33 s longer than the PPD latency for the vocoded sentences (p = 0.04; Cohen’s d = 1.64).

Figure 4.

Figure 4.

PPD latency values. The green bar represents the latency of the PPD for the unprocessed sentences, the grey bar represents the latency of the PPD for the sentences-in-noise, and the salmon bar represents the latency of the PPD for the vocoded sentences. The latency of the PPD was significantly shorter for the vocoded sentences than the sentences-in-noise and the unprocessed sentences. The “*” and “**” represent significance at p < 0.05, p < 0.01, respectively.

Peak pupil diameter and latency by percent correct

Figure 5 (Supplemental Table 2) shows change in PPD by word recognition accuracy bin (accuracy for the unprocessed sentences was deliberately excluded as the results were at ceiling performance and would skew the data). The final model indicated a significant interaction between degradation type and word recognition accuracy (F3,129.12 =3.6, p = 0.015) and post-hoc Bonferroni corrected t-tests revealed several distinctions: (1) within the vocoded conditions (salmon colour), PPD was significantly different between the 100% and 99–72% (p < 0.001; Cohen’s d = 1.78); the 100% and <51% (p ≤ 0.001; Cohen’s d = 2.44); and between the 71–51% and <51% word recognition accuracy categories (p < 0.001; Cohen’s d = 1.78) and (2) PPD was significantly greater during vocoded sentences than the sentences-in-noise for the 99–72% and <51% word recognition accuracy bins (p < 0.001; Cohen’s d = 0.76 and1.15, respectively).

Figure 5.

Figure 5.

Change in PPD from baseline by word recognition accuracy. The salmon-coloured brackets connect the means with significant differences within the vocoded sentences and the black brackets connect the means with significant differences between the vocoded sentences and the sentences-in-noise. Mean PPD for responses in the 99–72% and the <51% accuracy bins were significantly greater than those in the 100% accuracy bin; mean PPD for responses in the <51% accuracy bin were also significantly greater than those in the 71–51% accuracy bin. Mean PPD for the vocoded sentences was significantly greater than mean PPD for the sentences-in-noise for the 99–72% and the <51% word recognition accuracy bins. The “*” and “**”represent significance at p < 0.05, p < 0.01, respectively.

Supplemental Table 2 also lists the latencies of the PPD (accuracy for the quiet sentences was deliberately excluded as the results were essentially at ceiling performance). The final model indicated a significant effect of degradation type (F1,133=22.82, p < 0.001; Cohen’s d = 1.30) with mean PPD for the sentences-in-noise occurring approximately 0.53 s later than the PPD for the vocoded sentences.

Discussion

This study examined how different input-related demands influenced pupillometric and subjective measures of listening effort. We reasoned that if word recognition ability was the same across the two conditions (vocoded sentences and sentences-in-noise), differences in objective and subjective listening effort was indicative of the different input-related demands imposed by the different stimuli. Consistent with our predictions, subjective effort ratings, BPD, PPD and PPD latency were significantly different for the vocoded sentences than the sentences-in-noise despite similar speech perception performance; these effects were large and not due to differences in word understanding or analysis methods. Contrary to our predictions, pupillometric outcomes did not change significantly as the stimuli were made more challenging; however, we did observe significant differences in the pupillometric outcomes when the results were analysed based on word recognition accuracy, but only for the vocoded sentences. Our findings further corroborate earlier studies (eg Winn and Teece 2022; Fiedler et al. 2021) that demonstrated pupillometry was sensitive to differences in listening effort evoked by factors that extend beyond word recognition ability.

Pupillometric findings support differences in input-related demands

Several studies note that degradation of a speech signal can occur at any point–at the source, during transmission, or during reception–and that these different forms of degradation will require different cognitive mechanisms (eg increased listening effort) to facilitate understanding (eg Mattys et al. 2012; Pichora-Fuller et al. 2016; Peelle 2018; Rönnberg, Holmer, and Rudner 2019). We reasoned that the large effects we observed (Figure 3) in the absence of differences in word understanding (Figure 1) were due to differences in how the sentences were degraded; this effect remained when PPD was examined based on high and low levels of word recognition accuracy (99–72% and <51%; Figure 5). We suggest that the additional listening effort exerted to understand vocoded speech was due to source factor manipulations while the listening effort exerted to understand speech-in-noise was due purely to transmission factor manipulations.

Noise-vocoded speech preserves the temporal envelope of speech but discards the spectral components whereas steady-state noise “fills-in” the temporal envelope. The preserved temporal patterns of vocoded speech may have been recognised by our participants as a familiar speech pattern, but one that did not conform to their expectations, thus resulting in increased listening effort (eg Peelle 2018; Rönnberg, Holmer, and Rudner 2019).

Typically, PPD latency is longer for more challenging stimuli but our findings showed that in addition to having larger PPD values, the PPD latency for vocoded sentences was shorter than the PPD latency for the sentences-in-noise. This may imply that the listening effort required for source factor manipulations was less demanding to initiate but more effortful to sustain, or that the latency differences were simply a byproduct of a consistent noise cue before stimulus onset. More investigation is needed to confirm or refute these suggestions.

No indication of fatigue

We found that BPD was significantly smaller during the vocoded sentences than the sentences-in-noise. A recent study demonstrated that a smaller BPD was associated with greater fatigue in older adults with and without hearing loss (Alhanbali et al. 2021) but we feel that fatigue was not an issue in the present study of young adults. Wang et al. (2018) suggested that fatigue leads to disengagement from a task and in turn, a reduction in PPD. While we observed a smaller BPD during vocoded speech, we also observed a larger PPD compared to the sentences-in-noise, which is not consistent with Wang and colleagues’ observation. Additionally, Aston-Jones and Cohen (2005) found that a smaller BPD and larger PPD was suggestive of task engagement, which supports our findings and suggests that fatigue was not a dominant concern for our participants. Finally, providing the participants with breaks between the different blocks could have also limited fatigue.

Subjective ratings of effort, but not speech perception, are affected by input-related demands

Moore and Picou (2018) suggested that participants who have a difficult time understanding the concept of effort will rate something that is more concrete, such as their performance on the task. In the present study, word recognition abilities were statistically similar across the vocoded sentences and sentences-in-noise, yet the subjective effort ratings for the vocoded sentences were globally greater. Therefore, word understanding alone does not fully explain why the participants rated the vocoded sentences as more effortful. We propose expanding on the findings of Moore and Picou (2018) by suggesting that subjective listening effort is also influenced by the input-related demands of the speech material.

Findings not due to choice of baseline period

To further demonstrate that these findings were due to differences in input-related demands and not due to differences in analytical methods, we examined the influence of different baseline periods on the PPD response. BPD is typically assessed during a short window before the onset of a stimulus. This meant that for the present study, BPD values for the vocoded sentences occurred during a quiet period while the BPD values for the sentences-in-noise occurred when noise was present. The mere presence of noise alters pupil dynamics and may lead to different BPD, which in turn may affect PPD and peak latency values (Winn et al. 2018). Therefore, we examined PPD for the sentences-in-noise during two baseline periods: one that began during silence (PPDS) and one that began while the participant listened to the noise (PPDN). These different analysis windows did not significantly impact BPD, yet the PPDS were significantly larger than the PPDN. This is best explained by the fact that the BPD taken in quiet resulted in numerically larger values which then resulted in smaller changes relative to baseline for the PPDS measures. Put more simply, baseline pupil diameter is measured in millimetres while PPD is measured in tenths of a millimetre. Therefore, the relative differences in scale between baseline and peak measures means that even a small, non-significant difference in BPD can significantly alter the PPD. However, pupillometric values for the vocoded sentences were significantly different from the sentences-in-noise irrespective of which baseline period was used. Given this information, we chose to average the pupillometrics from both baseline periods for the sentences-in-noise; however, the literature would benefit from further discussion about how to best account for such differences when examining stimuli that are constructed by manipulating different input-related demands.

Potential concerns

We did not anticipate non-significant findings for PPD within the vocoded sentences and the sentences-in-noise as the difficulty increased. We analysed the pupil responses in deference to the suggestions of Winn et al. (2018) by implementing an automated script to remove individual trials containing downward-sloping pupil responses and other anomalies (eg outliers). Our main justification for this was to reduce the potential for subjective biases influencing our analysis; however, in so doing we likely removed more responses (eg 40% of responses during the BPDS) than the 16–18 clean recordings recommended by Winn et al. (2018), thereby reducing statistical power and the ability to detect small differences in PPD. This finding underscores the importance of aligning the analysis closer to the target onset, even when comparing stimuli that are degraded using different methods.

Summary

In summary, our findings support our hypothesis that pupillometric and subjective measures of listening effort are likely modulated by different input-related demands: vocoded sentences by source factor manipulations and sentences-in-noise by transmission factor manipulations. Pupillometric differences were further emphasised when responses were analysed based on word recognition accuracy, but only for the vocoded sentences. It is important for future studies to examine the effects of clinical interventions (eg auditory training) to address source factors and digital algorithms (eg noise reduction, directional microphones) to address transmission factors as a means for reducing listening effort and improving patient outcomes. Further investigation is also needed to fine-tune the balance between automated pupil analyses and maintaining adequate statistical power.

Supplementary Material

2

Funding

Funding was provided by NIDILRR (National Institute on Disability, Independent Living, and Rehabilitation Research) 90REGE0013. The content of this article does not represent the views of the United States government or the Department of Veterans Affairs.

Footnotes

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  1. Alhanbali S, Dawes P, Millman RE, and Munro KJ. 2019. “Measures of Listening Effort Are Multidimensional.” Ear and Hearing 40 (5): 1084–1097. doi: 10.1097/AUD.0000000000000697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alhanbali S, Munro KJ, Dawes P, Carolan PJ, and Millman RE. 2021. “Dimensions of Self-reported Listening Effort and Fatigue on a Digits-in-Noise Task, and Association with Baseline Pupil Size and Performance Accuracy.” International Journal of Audiology 60 (10): 762–772. doi: 10.1080/14992027.2020.1853262 [DOI] [PubMed] [Google Scholar]
  3. Aston-Jones G, and Cohen JD. 2005. “An Integrative Theory of Locus Coeruleus-norepinephrine Function: Adaptive Gain and Optimal Performance.” Annual Review of Neuroscience 28: 403–450. doi: 10.1146/annurev.neuro.28.061604.135709 [DOI] [PubMed] [Google Scholar]
  4. Fiedler L, Ala TS, Graversen C, Alickovic E, Lunner T, and Wendt D. 2021. “Hearing Aid Noise Reduction Lowers the Sustained Listening Effort During Continuous Speech in Noise—A Combined Pupillometry and EEG Study.” Ear and Hearing 42 (6):1590–1601. doi: 10.1097/AUD.0000000000001050 [DOI] [PubMed] [Google Scholar]
  5. Francis AL, Bent T, Schumaker J, Love J, and Silbert N. 2021. “Listener Characteristics Differentially Affect Self-reported and Physiological Measures of Effort Associated with Two Challenging Listening Conditions.” Attention, Perception, & Psychophysics 83 (4): 1818–1841. doi: 10.3758/s13414-020-02195-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gilzenrat MS, Nieuwenhuis S, Jepma M, and Cohen JD. 2010. “Pupil Diameter Tracks Changes in Control State Predicted by the Adaptive Gain Theory of Locus Coeruleus Function.” Cognitive, Affective, & Behavioral Neuroscience 10 (2): 252–269. doi: 10.3758/CABN.10.2.252 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Giuliani NP, Brown CJ, and Wu YH. 2020. “Comparisons of the Sensitivity and Reliability of Multiple Measures of Listening Effort.” Ear and Hearing 42 (2): 465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Institute of Electrical and Electronics Engineers. 1969. “IEEE Recommended Practice for Speech Quality Measurements.” IEEE Transactions on Audio and Electroacoustics 17 (3): 225–246. doi: 10.1109/TAU.1969.1162058 [DOI] [Google Scholar]
  9. Kahneman D 1973. Attention and Effort Hoboken: Prentice-Hall, Inc. [Google Scholar]
  10. Kahneman D, Tursky B, Shapiro D, and Crider A. 1969. “Pupillary, Heart Rate, and Skin Resistance Changes during a Mental Task.” Journal of Experimental Psychology 79 (1): 164–167. doi: 10.1037/h0026952 [DOI] [PubMed] [Google Scholar]
  11. Kramer SE, Lorens A, Coninx F, Zekveld AA, Piotrowska A, and Skarzynski H. 2013. “Processing Load during Listening: The Influence of Task Characteristics on the Pupil Response.” Language and Cognitive Processes 28 (4): 426–442. doi: 10.1080/01690965.2011.642267 [DOI] [Google Scholar]
  12. Liu Y, Rodenkirch C, Moskowitz N, Schriver B, and Wang Q. 2017. “Dynamic Lateralization of Pupil Dilation Evoked by Locus Coeruleus Activation Results from Sympathetic, Not Parasympathetic.” Contributions. Cell Reports 20 (13): 3099–3112. doi: 10.1016/j.celrep.2017.08.094 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Mackersie CL, and Calderon-Moultrie N. 2016. “Autonomic Nervous System Reactivity during Speech Repetition Tasks: Heart Rate Variability and Skin Conductance.” Ear & Hearing 37 (1): 118S–125S. doi: 10.1097/AUD.0000000000000305 [DOI] [PubMed] [Google Scholar]
  14. Mattys SL, Davis MH, Bradlow AR, and Scott SK. 2012. “Speech Recognition in Adverse Conditions: A Review.” Language and Cognitive Processes 27 (7–8): 953–978. doi: 10.1080/01690965.2012.705006 [DOI] [Google Scholar]
  15. McGarrigle R, Dawes P, Stewart AJ, Kuchinsky SE, and Munro KJ. 2017. “Pupillometry Reveals Changes in Physiological Arousal during a Sustained Listening Task.” Psychophysiology 54 (2): 193–203. doi: 10.1111/psyp.12772 [DOI] [PubMed] [Google Scholar]
  16. McLaughlin DJ, and Van Engen KJ. 2020. “Task-evoked Pupil Response for Accurately Recognized Accented Speech.” The Journal of the Acoustical Society of America 147 (2): EL151–EL156. doi: 10.1121/10.0000718 [DOI] [PubMed] [Google Scholar]
  17. McMahon CM, Boisvert I, de Lissa P, Granger L, Ibrahim R, Lo CY, Miles K, and Graham PL. 2016. “Monitoring Alpha Oscillations and Pupil Dilation across a Performance-intensity Function.” Frontiers in Psychology 7: 745. doi: 10.3389/fpsyg.2016.00745 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Moore TM, and Picou EM. 2018. “A Potential Bias in Subjective Ratings of Mental effort.” Journal of Speech, Language, and Hearing Research: JSLHR 61 (9): 2405–2421. doi: 10.1044/2018_JSLHR-H-17-0451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Peelle JE 2018. “Listening Effort: How the Cognitive Consequences of Acoustic Challenge Are Reflected in Brain and Behavior.” Ear and Hearing 39 (2): 204–214. doi: 10.1097/AUD.0000000000000494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Pichora-Fuller MK, Kramer SE, Eckert MA, Edwards B, Hornsby BW, Humes LE, Lemke U, et al. 2016. “Hearing Impairment and Cognitive Energy: The Framework for Understanding Effortful Listening (FUEL).” Ear & Hearing 37 (1): 5S–27S. doi: 10.1097/AUD.0000000000000312 [DOI] [PubMed] [Google Scholar]
  21. Rajkowski J, Kubiak P, and Aston-Jones G. 1994. “Locus Coeruleus Activity in Monkey: Phasic and Tonic Changes Are Associated with Altered Vigilance.” Brain Research Bulletin 35 (5–6): 607–616. doi: 10.1016/0361-9230(94)90175-9 [DOI] [PubMed] [Google Scholar]
  22. Rönnberg J, Holmer E, and Rudner M. 2019. “Cognitive Hearing Science and Ease of Language Understanding.” International Journal of Audiology 58 (5): 247–261. doi: 10.1080/14992027.2018.1551631 [DOI] [PubMed] [Google Scholar]
  23. Steinhauer SR, Siegle GJ, Condray R, and Pless M. 2004. “Sympathetic and Parasympathetic Innervation of Pupillary Dilation during Sustained Processing.” International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology 52 (1): 77–86. doi: 10.1016/j.ijpsycho.2003.12.005 [DOI] [PubMed] [Google Scholar]
  24. Strand JF, Brown VA, Merchant MB, Brown HE, and Smith J. 2018. “Measuring Listening Effort: Convergent Validity, Sensitivity, and Links with Cognitive and Personality Measures.” Journal of Speech, Language, and Hearing Research: JSLHR 61 (6): 1463–1486. doi: 10.1044/2018_JSLHR-H-17-0257 [DOI] [PubMed] [Google Scholar]
  25. Studebaker GA 1985. “A “Rationalized” Arcsine Transform.” Journal of Speech and Hearing Research 28 (3):455–462. doi: 10.1044/jshr.2803.455. [DOI] [PubMed] [Google Scholar]
  26. Wang Y, Naylor G, Kramer SE, Zekveld AA, Wendt D, Ohlenforst B, and Lunner T. 2018. “Relations between Self-reported Daily-life Fatigue, Hearing Status, and Pupil Dilation during a Speech Perception in Noise Task.” Ear and Hearing 39 (3): 573–582. doi: 10.1097/AUD.0000000000000512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Wendt D, Hietkamp RK, and Lunner T. 2017. “Impact of Noise and Noise Reduction on Processing Effort: A Pupillometry Study.” Ear and Hearing 38 (6):690–700. doi: 10.1097/AUD.0000000000000454. [DOI] [PubMed] [Google Scholar]
  28. Winn MB, Edwards JR, and Litovsky RY. 2015. “The Impact of Auditory Spectral Resolution on Listening Effort Revealed by Pupil Dilation.” Ear and Hearing 36 (4): e153–e165. doi: 10.1097/AUD.0000000000000145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Winn MB, and Teece KH. 2022. “Effortful Listening Despite Correct Responses: The Cost of Mental Repair in Sentence Recognition by Listeners with Cochlear Implants.” Journal of Speech, Language, and Hearing Research 65 (10): 3966–3980.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Winn MB, Wendt D, Koelewijn T, and Kuchinsky SE. 2018. “Best Practices and Advice for Using Pupillometry to Measure Listening Effort: An Introduction for Those Who Want to Get Started.” Trends in Hearing 22: 2331216518800869. doi: 10.1177/2331216518800869 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Wu YH, Stangl E, Zhang X, Perkins J, and Eilers E. 2016. “Psychometric Functions of Dual-task Paradigms for Measuring Listening Effort.” Ear and Hearing 37 (6): 660–670. doi: 10.1097/AUD.0000000000000335 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Zekveld AA, Heslenfeld DJ, Johnsrude IS, Versfeld NJ, and Kramer SE. 2014. “The Eye as a Window to the Listening Brain: Neural Correlates of Pupil Size as a Measure of Cognitive Listening Load.” NeuroImage 101: 76–86. doi: 10.1016/j.neuroimage.2014.06.069 [DOI] [PubMed] [Google Scholar]
  33. Zekveld AA, Koelewijn T, and Kramer SE. 2018. “The Pupil Dilation Response to Auditory Stimuli: Current State of Knowledge.” Trends in Hearing 22: 2331216518777174. doi: 10.1177/2331216518777174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Zekveld AA, Kramer SE, and Festen JM. 2010. “Pupil Response as an Indication of Effortful Listening: The Influence of Sentence Intelligibility.” Ear and Hearing 31 (4): 480–490. doi: 10.1097/AUD.0b013e3181d4f251 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

2

RESOURCES