Skip to main content
PLOS One logoLink to PLOS One
. 2022 Dec 2;17(12):e0278506. doi: 10.1371/journal.pone.0278506

Luminance effects on pupil dilation in speech-in-noise recognition

Yue Zhang 1,2,3,*,#, Florian Malaval 1,#, Alexandre Lehmann 1,2,3,#, Mickael L D Deroche 1,2,3,4,#
Editor: Sebastiaan Mathôt5
PMCID: PMC9718387  PMID: 36459511

Abstract

There is an increasing interest in the field of audiology and speech communication to measure the effort that it takes to listen in noisy environments, with obvious implications for populations suffering from hearing loss. Pupillometry offers one avenue to make progress in this enterprise but important methodological questions remain to be addressed before such tools can serve practical applications. Typically, cocktail-party situations may occur in less-than-ideal lighting conditions, e.g. a pub or a restaurant, and it is unclear how robust pupil dynamics are to luminance changes. In this study, we first used a well-known paradigm where sentences were presented at different signal-to-noise ratios (SNR), all conducive of good intelligibility. This enabled us to replicate findings, e.g. a larger and later peak pupil dilation (PPD) at adverse SNR, or when the sentences were misunderstood, and to investigate the dependency of the PPD on sentence duration. A second experiment reiterated two of the SNR levels, 0 and +14 dB, but measured at 0, 75, and 220 lux. The results showed that the impact of luminance on the SNR effect was non-monotonic (sub-optimal in darkness or in bright light), and as such, there is no trivial way to derive pupillary metrics that are robust to differences in background light, posing considerable constraints for applications of pupillometry in daily life. Our findings raise an under-examined but crucial issue when designing and understanding listening effort studies using pupillometry, and offer important insights to future clinical application of pupillometry across sites.

Introduction

Within hearing research, pupillometry has been shown to be a valid tool for quantifying listening effort in different listening conditions, such as with different masking noise, spectral degradation, speech intelligibility level and syntactic complexity [14]. Typically, when a speech recognition task gets difficult, listeners show a greater task-evoked pupillary response, until the task is so challenging that listeners ‘give up’. For instance, one of the most investigated factors on listening effort is the type and level of masking noise [510]. Ohlenforst et al., [8] examined peak pupil dilation (PPD) across a wide range of SNR (-20 dB to + 16 dB) in stationary noise and single-talker maskers. Results showed an inverse U-shaped relation between PPD and masking noise: as SNR decreased, listeners exhibited a bigger PPD until the task was so difficult that PPD was reduced again at the most adverse SNR. This highlights the non-monotonic nature of the pupil dynamics as a function of task difficulty.

While pupil dilation serves as a robust ‘reporter variable’ for listening effort, it is also sensitive to many other factors, among which luminosity variation is the most prominent [11,12]. The pupil size is controlled by two antagonistic smooth muscle groups, the iris sphincter and dilator muscles. When light falls on the retina, an increased neural activity in the pretectal regions and stimulation of the Edinger-Westphal nucleus leads to activation of preganglionic parasympathetic neurons and innervation of the ciliary ganglion [13,14]. These, in turn, command the constrictor muscles to tighten and lead to pupil constriction. Under the direct control of the autonomic nervous system (ANS), the pupillary response to light reflects the balance between the Sympathetic Nervous System (SNS) and the Parasympathetic Nervous System (PNS). While the range of pupillary movement in response to luminance levels can vary from less than 1mm to more than 9mm, in comparison, the largest of cognitively driven movements are about 0.5mm [1,15]. This difference in the pupillary response demonstrates that the light reaction has a much larger effect on the pupil size dynamic range than the cognitive pupillary component [16,17]. Therefore, it is important to identify and disentangle the impact of light on the pupillary response during a cognitive task, in order to validate pupillary response as a robust index for cognitive effort in ecological and likely more complex environments [18,19].

Past studies have indicated that not only is the light-induced response larger than the cognitive modulation of the pupillary response overall, but there is also an interaction between the two. However, past studies suggested inconclusive, sometimes contradictory, results. For instance, Steinhauer et al., [20] conducted two arithmetic tasks (continuously subtracting a random number by 7, i.e. difficult condition; or adding by 1, i.e. easy condition) in either dark or moderate room light. During the baseline period, no interaction was found between task difficulty and light condition. But in the response period, there was a significant interaction: the two tasks (difficulty levels) did not differ in darkness but did in moderate light (greater PPD when more difficult). Thus, pupillary changes observed during their cognitive task decreased in dark lighting conditions. Peysakhovich et al., [17] conducted a short-term memory task where participants were asked to either recall or not recall a series of auditorily presented digits. Task difficulty was controlled by the number of digits (5, 7 or 9 digits). Screen luminance changed from trial to trial among black, gray, or white. No interaction between task difficulty and light condition was observed for the baseline period, but in darker conditions, a given memory load induced higher PPD. Thus, pupillary changes observed during their cognitive task were increased by dark lighting conditions. Peysakhovich et al., [21] required participants to perform a N-back recall task coupled with an arithmetic task. Participants either added or subtracted two numbers displayed on the screen and had to respond whether the number matched the result from one block back or two blocks back. The screen was either gray (low light) or white (high light). This time, no effect of light and no interaction with task difficulty were found on the PPD, but differences in baseline pupil diameter (pre-task) between the 1-back and 2-back tasks were observed, and they were larger in low light. Larger effects of cognitive arousal on pupil size in low range of luminances was also reported in Pan et al., [22]. Participants performed auditory math problems that were either Easy or Hard continuously (4s for listening to the question and 2s for keyboard response), while different luminance levels cycled on the screen (60s for each level presentation) in each task block. The Hard condition produced larger mean pupil diameters than the Easy condition across luminance levels. However, the differences between Hard and Easy condition were larger at low- and mid-luminances. Based on their results, authors recommended mid-luminances for pupillometry studies investigating cognitive event evoked pupillary response. Książek et al., [23] applied several analysis methods (single-value measures in the time and frequency domain, pupil time course analysis) on two data sets (data set A investigating the impact of SNR and data set B [24] investigating the impact of luminance). Results showed a significant effect of luminance in all investigated pupillary measures. This study however did not address the possibility of an interaction between SNR and luminance given the fixed level of performance in dataset B. Typically, studies investigating listening effort manipulate task difficulty levels (i.e., SNR levels, background noise types, hearing aid/cochlear implant features turned on/off, SRT levels etc.) to observe changes in pupil responses. While tasks like sentence recognition and arithmetic tasks arguably require transient investment of cognitive resources, tasks like digits recall and matching require constant straining of mental effort, and more complex tasks such as speech communication in ecological rooms require concurrent and sustained effort. Different tasks require different types of resources. As shown in previous studies, pupillary responses to concurrent cognitive tasks showed different patterns compared to single cognitive task, therefore, it is reasonable to assume that interaction between complex cognitive demands and luminance will be even more complicated [18,23,25]. Finally, to add yet another level of complexity, many findings on this very question observed within the normal hearing (NH) population may not be directly applicable to the hearing-impaired (HI) community. For example, Wang et al., [24] measured SRTs for NH and HI participants and showed that participants with better hearing acuity showed a larger difference in PPD between dark and light conditions. Participants in Pan et al. [2022] (age 18–35; no precise checks on hearing status for the auditory math task) showed heterogeneity in the luminance at which the biggest pupillary difference between Easy and Hard condition occurred. The causes of this heterogeneity are yet to be systematically examined (i.e., hearing status, age etc.). In summary, how luminance affects pupillary response during speech communication is still under-investigated, and this knowledge is important for the validity of pupillometry in clinical settings where different clinics might conduct pupillometry in different luminance levels and patients vary greatly in their pupillary dynamics.

The primary aim of the current study was to examine the impact of light level on the pupillary response using a well-replicated paradigm in listening effort research that varied task difficulty by manipulating the SNR during sentence recognition. To optimize this investigation, a preliminary experiment explored a range of four SNRs, 0, +7, +14 dB, and quiet condition. This served as a replication phase to ensure that the results were consistent with past listening effort studies (i.e. making sure our glasses/equipment functioned properly). It was also an opportunity to explore the robustness of different metrics or methods which might help overcome less-than-ideal lighting conditions. For example, pupillary response is traditionally examined by extracting a feature from the overall pupil trace (e.g., PPD amplitude or latency) or by fitting the pupil variation over time (e.g., growth curve analysis, generalised additive mixed modelling, etc). It is likely that both approaches could be confounded by different luminances. For instance, when PPD is calculated by subtracting or dividing the peak dilation by the baseline, past studies assumed that they were two independent components during a cognitive task [1], but this assumption is questionable given that light could have a differential effect on the baseline and peak window of the pupil dilation [21]. So, we considered different analysis methods (baseline subtraction, proportional change) to investigate whether the effect of luminance or its interaction with task difficulty would depend on the analytical approach considered. Different from Książek et al., [23], we will focus on PPD as the index of listening effort, due to its wide application in both research and clinical studies. In summary, the current study will explore the impact of luminance on pupillometry and highlight its importance for both the experimental design and analysis methods.

Materials and methods

Participants

Twenty-one listeners (11 women; 10 men) were recruited in Exp.1 from 18 to 49 years of age, with a mean (SD) of 27.3 (8.6) years. Thirty-one listeners were initially recruited for Exp.2, but three were eventually excluded (due to excessive blinks–section 3.2), reducing the sample size to 28 (21 women; 7 men) aged 18 to 51 years old, with a mean (SD) of 27.7 (9.6) years. A pure tone audiometry was administered to ensure that all participants had binaural thresholds at or better than 25 dB HL at 0.25, 0.5, 1, 2, 4, 8 kHz. All participants were native speakers of either French or English (the study being run always in their native language). This work received ethical approvals from McGill University Faculty of Medicine Research Ethics Board under the reference A05-B11-18B. Prior to the experiment, participants were given enough time to read the protocol and gave written informed consent for their participation, receiving $15 per experiment as compensation for their time.

Stimuli

Speech stimuli were from the Institute of Electrical and Electronics Engineers (IEEE), they are phonetically balanced sentences [26] recorded from a male native American English speaker, and sentences from the Hearing in Noise Test (HINT) sentences recorded from a male Quebecois French speaker [27]. In all conditions except the quiet one, sentences were masked by speech-shaped noise. This noise was generated from the long-term excitation pattern of the entire material, respectively in English or French, and was always fixed at 65 dB SPL. Experiment 1 varied the SNR level, using 0, +7, +14 dB and quiet conditions. Experiment 2 varied both SNR and light levels, in which case only the 0 and +14 dB SNR were selected (based on the results of Exp.1). Changes in SNR were implemented by raising the target level (from 65 dB at 0 dB up to 79 dB at +14 dB, and back to 65 dB in quiet). Note that the reason to change the target level rather than the masker level was to prevent listeners from anticipating the difficulty of a block before hearing the target [9].

In experiment 2, three light levels were applied, by adjusting the room light level and screen luminance level together to reach close-to-0 lux, 75 lux, and 220 lux. The light level was measured with the luxometer (TES-1335) sensor positioned at the same height as participants’ left eye and facing the screen, to approximate the amount of light hitting participants’ eyes.

In each experiment, twenty sentences were tested for each condition, resulting in a total of 80 sentences in Exp.1 and 120 sentences in Exp.2. Within a block of 20 sentences, the order of the material remained the same, but the sequence of blocks was fully randomized, and different across participants.

Procedure

Experiments were performed between February 2018 and February 2020 at the Center for Interdisciplinary Research in Music Media and Technology at McGill University, inside a sound-attenuated room. Participants sat on a rigid chair in the room, 2m in front of a 35-inch screen monitor and wearing an infrared binocular eyetracker (Tobii Glasses Pro2, 100 Hz sampling rate).

After a demo of the experiment procedure, participants firstly listened to five sentences (excluded from the test) at 14 dB SNR to familiarise themselves with the test and typical sentences of the speech material.

Before each block, the room and screen luminance levels were adjusted according to the randomisation of the Matlab program. The luminance levels were then fixed throughout the block, to avoid changes in light level inducing task-unrelated pupillary response. Listeners were given at least 1 minute to adjust for the new light level and for the pupil to reach its light reflex target [28], plus the time necessary until they reported that they felt comfortable to continue. All audio stimuli were presented through a Beyer Dynamics DT 990 Pro headphone via an external soundcard (Edirol UA), calibrated at 65 dB SPL. Experiments were run in Matlab 2016b, using Psychtoolbox and custom software. In each trial, the presentation of the speech-shaped noise masker started 3s before the onset of the sentence. This was to provide time for the pupils to recover from the previous trial to avoid carry-over effect (2s after previous trial, initiated by the experimenter) and to measure pre-task baseline pupil diameter (1s). Participants were instructed to fixate on the black cross displayed at the center of the screen. After 3s, the sentence was played along with the continuous noise, and the presentation of the masker noise was turned off 2s after the offset of the sentence, to allow the pupil to reach its peak. Upon the masker offset, participants were prompted by the black cross turning into a circle displayed at the screen center to repeat back the sentence verbally. This delayed verbal response ensured that speech motor commands of the participants did not tarnish the pupillary response corresponding to processing of the sentence perceived. Their verbal responses were scored by the experimenter based on the number of key words correctly repeated. Then the experimenter proceeded to the next trial.

Data analysis

Behavioral data

To be consistent with the statistical approach of the pupil data (where traces were aggregated per block), a generalized linear mixed-effect model was fitted on listeners’ performance, averaged over the 20 sentences of each block. Specifically, a logistic model provided a suitable way to control for the ceiling effect [29]. We considered a model with one fixed factor in Exp.1 (SNR, as a categorical variable) or two fixed factors in Exp.2 (SNR and luminance, both as categorical) and one random factor (subject, also categorical), including random intercepts only. With a single observation per block, the model could not support any further degree of complexity. Further analyses are presented in the S1 Appendix where we considered a trial-based approach that allowed us to examine a more complex model with by-subject random slopes for each fixed factor and look specifically at the effect of position within a block. Overall, they did not provide any further insights into the behavioral data. Chi-squared tests were used to examine main effects and possible interactions. Differences between levels of each factor and interactions were examined with post-hoc Wald test. P values were estimated using the z distribution in the test as an approximation for the t distribution [30].

Pupil data: Preprocessing

The Tobii glasses possess four cameras, two for each eye, along with a camera recording what the participant looked at, including position of their gaze. Each camera had a sampling frequency of 50 Hz, interleaved by 10 ms (producing a pseudo 100-Hz sampling frequency). Sample points for each camera were first processed separately, removing blinks with a +/- 60 ms window on the edges and reconstructing the missing points with an autoregressive method (fillgaps function in Matlab). Then the signals from the two cameras were recombined, and any value below or above 3 standard deviation (SD) of the mean pupil was again counted as blinks and reconstructed using fillgaps. We made a further attempt to tag additional outliers by spotting gaze positions 3-SD away from the distribution of “valid” gazes (i.e., excluding gazes corresponding to points already reconstructed) using the Mahalanobis distance but this approach was too stringent in that it suspected data points that appeared perfectly reasonable.

To evaluate whether the quality of the recordings could change with experimental condition, we extracted the percentage of data points coded as blinks for each trial to use it as a dependent variable in repeated-measures analysis of variance (rm-ANOVA). In Exp.1, there was no effect of SNR [F(3,60) = 1.4, p = 0.241], with an average blink of 11.7% per trial. In Exp.2, there was no effect of SNR [F(1,27) = 2.5, p = 0.122], no effect of luminance [F(2,54) = 1.0, p = 0.379], and no interaction [F(2,54) = 1.5, p = 0.226], with an average blink of 15.2% per trial. With regard to trial exclusion, it is common to exclude any trial that exhibits over 20% of points coded as blinks, and with this criterion, it is common to exclude any subject with over 40% trials rejected (corresponding to 8/20 trials per block in this study). Although the percentage of blinks was on average lower than this value in both experiments, this criterion would have led to several rejections of participants in Exp.2 who provided generally decent data after deblinking. Thus, we decided to allow a more liberal criterion of 45% loss to keep as many participants as possible. A previous study also supported that 45% blink exclusion criterion would not affect group pupil results [31]. On this basis, only three participants had to be excluded from Exp.2 (blinks > 45% on average across 120 sentences, leading to >8/20 trials excluded). Another rm-ANOVA was conducted on the number of excluded trials as dependent variable. In Exp.1, there was no effect of SNR [F(3,60) = 1.1, p = 0.371], with an average of 0.7/20 trials excluded. In Exp.2, there was no effect of SNR [F(1,27) = 2.5, p = 0.123], no effect of luminance [F(2,54) = 0.2, p = 0.795], and no interaction [F(2,54)<0.1, p = 0.976], with an average of 1.0/20 trial excluded.

Finally, all valid traces were low-pass filtered at 10 Hz with a first order Butterworth filter to preserve only cognitively related pupil size modulation [32], and downsampled to 50 Hz to reflect the true sampling rate of the glasses. Processed data were then aggregated per listener by SNR and luminance conditions, aligned by the onset of the response prompt (except in Additional Analyses where traces were also aligned at the sentence onset to illustrate the impact of analytical window on the results).

Pupil data analysis

Baseline pupil diameter in each trial was calculated as averaged pupil trace 1s before the sentence onset. The pupil diameter measured from the sentence onset to the repeat prompt was subtracted from that baseline level to obtain relative pupil diameter changes elicited by the task. In Additional analyses, to examine whether the impact of luminance was independent of the PPD baseline-correction method, we attempted a posteriori two alternative ways of calculating PPD, as a percentage of the baseline, or as a percentage of the dynamic range. In all these methods, the search for PPD was restricted to a window starting from sentence onset and ending at the verbal response prompt (thereby excluding any confound with the pupil’s arousal induced by verbally repeating the sentence). Note that all sentences were left unprocessed in duration to avoid unnatural acoustic manipulation (compression or stretching of original sentences). This procedure was common in listening effort studies due to varied length of standardized sentences and the variability in sentence duration could be well controlled by consistent trace alignment (either by sentence onset or offset) [33]. Additional analysis took into account of this variability in sentence duration and alignment method and explored whether they affected our results.

A linear mixed-effect model (LME] was fitted on three dependent variables successively: baseline, PPD amplitude, and PPD latency. Each of these metrics was extracted on the trace averaged over a block (20 sentences), so there was only one observation per condition limiting the model complexity to random intercepts at most. Thus, just like for the behavioral data, we considered a model with one fixed factor in Exp.1 (SNR, as categorical variable) or two fixed factors in Exp.2 (SNR and luminance, both as categorical) and random intercepts by subject. Further analyses are presented (S1 Appendix) where we considered a trial-based approach that allowed us to examine more complex models with by-subject random slopes for each fixed factor and look specifically at the effect of position within a block. Chi-squared tests were used to examine main effects and possible interactions.

Results and discussion

Experiment 1: Effect of SNR

Behavioral results

As expected, speech intelligibility deteriorated as SNR decreased (Fig 1, left: SNR0, mean 89%, SD 2.3%; SNR7, mean 97.3%, SD 0.9%; SNR14, mean 98.4%, SD 0.5%; Quiet, mean 97.8%, SD 0.7%). The LME analysis confirmed a main effect of SNR [χ2(3) = 12.6, p = 0.005]. Intelligibility was worse at 0 dB than at any other SNR (p<0.020). Importantly, this impairment was relatively minor as intelligibility still approached 90% at 0 dB (and was largely at ceiling at +7 dB and beyond). Thus, the behavioral data confirmed that we chose a range of SNRs where listeners’ performance carried little information about the difficulty of the sentence recognition task. Typically, the behavioral data could not distinguish changes in task difficulty between +7 dB, +14 dB SNR, and quiet conditions.

Fig 1. Performance in the sentence recognition task.

Fig 1

Lists of twenty sentences presented at different SNRs in Exp.1 (left panel), in which 21 listeners sat in a normally-lit room (75 lux). Exp.2 measured performance in the same task (using different sentence lists than in Exp.1) at two SNRs in which 28 listeners sat in the same room under similar luminance settings (75 lux), or in close-to-darkness (0 lux), or in very bright luminance settings (220 lux). Luminance (although it has a huge impact on the pupil) had no effect on behavioral performance. Error bar represents 1 standard error from the mean.

Pupil results

The top panels of Fig 2 show the time course of the pupil diameter expressed in absolute unit (mm). As expected, these averaged traces exhibited a peak shortly after the end of sentences within the 2-sec window that we left before listeners were prompted to repeat the sentence. Past this prompt (time>0s in the abscissa), the pupil diameter increased again due to the verbal response and was therefore less relevant to the cognitive processes engaged in sentence decoding. There were differences in averaged pupil diameter across the four conditions, up to 0.2 mm between 0 and + 7 dB SNR. More importantly (for this type of non-demanding task), the baseline-corrected traces emphasize that the pupil dilated more strongly at 0 dB than in any other condition (bottom-left).

Fig 2. Pupil responses in Experiment 1.

Fig 2

Mean pupil traces across four SNR conditions, measured for 21 listeners in Experiment 1. Traces are expressed in absolute unit (top) or relative to the 1-sec baseline (bottom). Three metrics were extracted: Baseline (top-right), PPD amplitude (bottom-middle), and PPD latency (bottom-right). The pupil dilated more at 0 dB than any other SNR. Error bar and shaded width represents 1 standard error from the mean.

To substantiate these claims, a LME analysis was conducted for each of the three metrics. For baseline, there was no main effect of SNR [χ2(3) = 6.2, p = 0.100], suggesting that the small differences aforementioned (Fig 2, top-right) were not meaningful. For PPD amplitude, there was a main effect of SNR [χ2(3) = 22.8, p<0.001] driven by larger PPDs at 0 dB than at any other SNR (p<0.001). Estimates were -0.07, -0.09, and -0.08 mm, respectively for 7 dB, 14 dB, and quiet conditions relative to the 0 dB condition (Fig 2, bottom-middle; SNR0, mean 0.21mm, SD 0.03mm; SNR7, mean 0.14mm, SD 0.03mm; SNR14, mean 0.12mm, SD 0.03mm; quiet, mean 0.13mm, SD 0.02mm). For PPD latency, there was no main effect of SNR [χ2(3) = 5.8, p = 0.122]. Despite the pattern depicted (Fig 2, bottom-right), differences across SNRs were too weak to be significant, with this block-based approach. Note that a trial-based approach (S1 Appendix) appeared more sensitive to latency differences and did reveal a later PPD at 0 dB than at any other SNR.

Discussion

Exp1 replicated the effect of SNR on pupillary response [8,9], suggesting that our experimental setups and devices were sensitive and valid enough to capture task-evoked pupillary responses. A SNR of 0 dB elicited significantly bigger PPD and longer latency than +7, +14 dB SNR and quiet conditions, even when sentence intelligibility was at ceiling. The choice of SNR in our study was easier than in previous studies, which explains the ceiling in sentence recognition and lack of significant difference between 7 and 14 dB SNR in PPD and latency. This SNR choice is however very relevant to cochlear implant (CI) users (our future research) who generally need positive SNRs to understand speech above 50%. Despite this SNR choice being easier than most studies on NH listeners, our results still show consistent pattern with the literature. We also replicated a widely reported finding that PPD is larger when sentences are misunderstood or incorrectly repeated [44] (see S2 Appendix). Therefore, we proceeded to Exp2 by selecting 0 dB and 14 dB SNR due to the significant difference observed between the two conditions (and quiet being a qualitatively different setting).

Experiment 2: Effect of luminance

Behavioral results

Speech intelligibility deteriorated as SNR decreased from +14 to 0 dB, replicating the performance levels obtained in Exp.1 (SNR0, mean 90.5%, SD 1%; SNR14, mean 98.6%, SD 0.3%). Furthermore, intelligibility did not change depending on whether participants listened in a very bright or dark room (Fig 1, right: bright, mean 94.7%, SD 0.7%; medium, mean 94.6%, SD 0.6%; dark, mean 94.5%, SD 0.7%). The LME analysis revealed a main effect of SNR [χ2(1) = 158.4, p<0.001], but no main effect of luminance [χ2(2) = 0.2, p = 0.917] or of its interaction with SNR [χ2(2) = 0.8, p = 0.679]. Therefore, listeners’ performance was the same across the three luminance settings, and (being at 90% and above) it carried little information about the task difficulty.

Pupil results

Visual inspection of Fig 3 suggested that three luminance settings induced big differences in pupil diameter across all subjects, going as low as 2.7 mm on average in bright luminance and as high as 6.8 mm in darkness. In addition, the pupil appeared bigger at 0 dB than at +14 dB, in line with our expectations. Once again, these averaged traces exhibited a peak shortly after the end of sentences, which was easier to observe with baseline-corrected traces (Fig 3, bottom). The PPD induced by the easy task (+14 dB SNR) was roughly similar across the three luminance settings, and of small magnitude. In contrast, the PPD induced by the (slightly more) difficult task was overall larger but it was less pronounced in either darkness or brightness as compared to that observed in medium luminance.

Fig 3. Pupil responses in Experiment 2.

Fig 3

Mean pupil traces recorded for 28 listeners in Experiment 2: a sentence intelligibility task at two difficulty levels (0 and +14 dB SNR) under three luminance settings (dark, medium, and bright). Traces are all aligned by the onset of verbal response, and expressed in absolute units (top panels) to illustrate the massive light-evoked differences in pupil’s diameter, or corrected by their 1-sec baseline (bottom panels) to emphasize the peak pupil dilation. Shaded width represents 1 standard error from the mean.

To substantiate these claims, LME analyses were conducted for each of the three pupil metrics.

Baseline: There was no main effect of SNR [χ2(1) = 1.2, p = 0.263], but a main effect of luminance [χ2(2) = 225.7, p<0.001] without interaction [χ2(2) = 1.3, p = 0.529]. The pupil baseline reacted hugely to luminance (Fig 4, left) but as in Exp 1 there was not sufficient evidence that it depended on SNR.

Fig 4. Pupil metrics in Experiment 2.

Fig 4

Three pupil metrics, namely baseline (left), PPD amplitude (middle) and latency (right), extracted from block-averaged pupil traces recorded in experiment 2. Error bar represents 1 standard error from the mean.

PPD amplitude: There were both a main effect of SNR confirming larger PPDs at 0 than at +14 dB [SNR0, mean 0.15mm, SD 0.02mm; SNR14, mean 0.1mm, SD 0.01mm; χ2(1) = 11.8, p<0.001] and a main effect of luminance [bright, mean 0.1mm, SD 0.02mm; medium, mean 0.16mm, SD 0.03mm; dark, mean 0.13mm, SD 0.02mm; χ2(2) = 11.8, p = 0.003] without interaction between the two [χ2(2) = 2.4, p = 0.304]. PPDs were larger under medium luminance than in brightness (p = 0.002) or in darkness (although this latter comparison was not significant, p = 0.074) (Fig 4, middle). In other words, although the traces hardly differentiated the two SNRs in either bright or dark luminance, the current evidence is that they were both suboptimal at 0 or 14 dB, compared to medium luminance settings. This is one of the key findings of this article.

PPD latency: There was a main effect of SNR confirming later PPDs at 0 than at +14 dB [SNR0, mean -0.5s, SD 0.2s; SNR14, mean -0.04s, SD 0.3s; χ2(1) = 10.8, p = 0.006], an effect that was revealed in Exp.1 only with the trial-based approach (S1 Appendix) but not with the block-based approach, suggesting it was presumably a power limitation. There was no effect of luminance [χ2(2) = 3.4, p = 0.185] or interaction [χ2(2) = 1.0, p = 0.606]. On average across luminance settings, the PPD occurred 432ms later at 0 dB relative to 14 dB (Fig 4, right).

Discussion

Exp2 showed that the magnitude and timing of pupillary responses varied in different light settings, suggesting that task-evoked pupillary responses were affected by the luminance level. For baseline, luminance affected the absolute pupil diameter evenly across SNR conditions. Arguably, for a sentence recognition task without cognitive involvement prior to the trial (i.e., passively listening to the background noise), the response of pupil diameter in different luminance is dominated by the ANS [12]. As a reminder, SNS directly controls the pupil dilator muscle and PNS indirectly controls the pupil sphincter (constrictor) muscle. Pupillary light reflex is regulated by the PNS pathway [14,34], and when the cognitive stage kicks in (i.e., listening to a target sentence and preparing for verbal repetitions), pupillary dilation induced by task difficulty is regulated by PNS inhibition and SNS activity [20,35]. Firstly, the order of magnitude is remarkable: PPDs (of about 0.1 to 0.2 mm) were on the order of 10–20 times smaller in magnitude than luminance-evoked changes, consistent with previous studies on the comparative effect size of luminance-evoked and task-evoked pupillary responses [1,1517]. Secondly, although the task difficulties were the same in each luminance condition (and the behavior results supported this idea), task-evoked PPDs were different. For a given SNR level, when measured in dark (0 lux) and bright (220 lux) environments, the PPDs were smaller than in medium luminance (75 lux) environment. It seems that both extreme constriction and dilation restrain the range of task-evoked pupillary response, even after baseline correction. In other words, in those sub-optimal luminance conditions, it is likely that PPD is underestimated. The latency of PPDs, on the other hand, seems relatively unaffected by luminance (see Additional analysis on how the calculation method of latency could change this finding). Our finding that medium luminance is more conducive of larger task-evoked pupillary response than bright luminance is consistent with some previous findings [21,22]. But decreasing the luminance from medium to dark did not further increase the pupillary response as in Pan et al. [22], instead, in our current study, PPD in dark tended to be smaller than in medium luminance. The difference could be due to the calculation method of pupillary responses: in Pan et al. [22], pupillary responses were calculated as the mean pupil size which contained both the baseline and task-evoked response; in our current study, PPD was baseline-corrected to better capture time-aligned pupillary response to sentence recognition. It is likely that after correcting for the baseline which increased hugely in the dark, the benefit of low-luminance in inhibiting PNS when calculating PPD is scaled correspondingly. However, this scaling has been suggested to be linear, as demonstrated in Reilly et al., [36], therefore, PPD should not decrease from medium luminance to dark if the only factor that had changed is the decreased contribution of PNS. Perhaps, individual differences could introduce some non-linear factors in the pupillary response. Some evidence for this possibility can be glimpsed from Pan et al., [22], who reported low group agreement specifically in low luminances: while all participants showed consistent pattern at medium luminances, there were a few participants who showed no modulation at low luminances. But participants in our study and in Pan et al. [22] were relatively homogenous, leaving little room for investigating person-specific factors. Note that identifying and mapping out linear or non-linear relationships across participants on pupillary measures will not only help to control for confounds in task-evoked interpretations, but also provide meaningful methods to investigate complex interactions between PNS and SNS [20,36].

The lack of significant interaction between luminance and SNR conditions in our results show that this bias is relatively consistent across SNRs, suggesting that the luminance might not interfere greatly with experimental conditions or task difficulty in pupillary responses (e.g., a difficult task will always elicit a bigger pupillary response than an easy task, and incorrectly repeated sentences elicit bigger pupillary response than correctly repeated sentences, see S2 Appendix). However, a closer examination at the trend of our results raises potential issues. Note that from dark to bright luminance levels, PPD showed an inverted U-shape (Fig 4), suggesting that it is possible that there exists a luminance level where there will be an even bigger SNR contrast in the PPD response, and the interaction between SNR and luminance might reach significance at the ‘tipping point’ of that inverted U-shape (as already shown in the bigger separation of error bars in medium compared to dark/bright luminance conditions). A similar inverted U-shape was found by Pan et al. [22], where the biggest and most consistent difference in pupillary response between Hard and Easy auditory math tasks occurred at medium luminance levels among all ten luminance levels tested. To confirm this speculation, future studies need to apply a wider range of changes in the luminance to map out the entire psychometric function between luminance and task-evoked pupillary response. Without this knowledge, we might not be confident enough to synthesize knowledge across research sites to understand and compare the effect of task difficulty on pupillary response. To illustrate, for instance, even when Lab A and Lab B used the same devices, experimental designs and analytical pipelines, Lab A might observe an effect that is bigger than in Lab B, just because Lab A chose a luminance level that is closer to the ‘tipping point’ of the inverted U-shape.

Additional analyses

To further understand the complexity of the additive effect of pupillary light reflex and task-evoked pupillary response, we performed additional analyses to explore whether the impact of luminance we had observed could be affected by the analysis methods. Specifically, we examined baseline correction methods when calculating PPD and the impact of sentence duration. As discussed above, applying baseline correction might be partially responsible for the differences in task-evoked pupillary responses observed in our current study and in Pan et al. [22]. Several methods have been proposed in previous studies but it is unclear whether they could control for the impact of the luminance level on task-evoked pupillary responses. Also, sentence duration varies and this variability contributes to the variability of the analytical window (sentence onset to the repeat prompt), hence possibly generating variability in PPD amplitude and latency measurement.

Effect of PPD calculation methods

Up to now, we calculated PPD in mm by substracting the pupil diameter in the analytical window (sentence onset to the repeat prompt) by the baseline diameter. Arguably, other methods have been proposed to calculate PPD [33,36,37]. Here, we explored two alternatives, 1) as a percentage relative to baseline [38,39], and 2) as a percentage relative to the dynamic range of a given subject [40]. This latter method was particularly suited to Exp.2, where the range evoked by luminance differences was considerable. To this aim, we pulled all 120 trials for a given subject, and extracted the 0.5 and 99.5 percentile of the distribution of sample points to define the luminance-evoked dynamic range for a given subject. In Exp.1, we could not access the same metric but we followed the same approach to access the task-evoked dynamic range across the four SNRs (80 trials). Note that this is different from other approaches where the dynamic range was measured outside of the task when the subject is at rest [41]. The dynamic range used here was presumably larger than it would have been—had it been extracted prior to the experimental protocol.

Fig 5 shows the averaged traces in each experiment, for the first (top) and second alternative (bottom). A LME analysis was reiterated for these two alternative ways of calculating PPD amplitude (all results were identical for PPD latency). Expressed as a proportional change from baseline, there was a main effect of SNR [χ2(3) = 16.6, p<0.001] driven by PPDs between 1.7 to 2.2% larger at 0 dB than at any other SNR in Exp.1 (p<0.001) (Fig 5, top-left). In Exp.2, a considerably large baseline in darkness would underestimate the PPD amplitude whereas a small baseline in brightness would potentially overestimate it (Fig 5, top-right). A LME analysis revealed both a main effect of SNR [χ2(1) = 10.3, p = 0.001] and a main effect of luminance [χ2(2) = 8.3, p = 0.016] without interaction [χ2(2) = 2.6, p = 0.275]. This approach would likely better preserve SNR differences in bright luminance and hinder them in dark luminance, but the main findings were unchanged qualitatively. In the second alternative, expressed as a proportional change relative to the dynamic range, there was a main effect of SNR in Exp.1 [χ2(3) = 22.3, p<0.001] driven by PPDs between 4.7 to 5.8% larger at 0 dB than at any other SNR (p<0.001) (Fig 5, bottom-left). In Exp.2, there were both a main effect of SNR [χ2(1) = 14.9, p<0.001] and a main effect of luminance [χ2(2) = 11.3, p = 0.003] without interaction [χ2(2) = 2.5, p = 0.288] (Fig 5, bottom-right). Once again, the results were qualitatively unchanged. In other words, we conclude that one cannot “repair” the poor sensitivity of pupil reading in darkness or in very bright luminance by choosing a better method to calculate PPD.

Fig 5. Averaged pupil traces and metrics.

Fig 5

Averaged pupil traces and their respective block-averaged PPD amplitudes expressed as a proportional change from baseline (top) or as a proportional change relative to the dynamic range (bottom), in both experiments. In Exp.2, the dynamic range was much greater than in Exp.1 (as it was induced by luminance differences instead of subtle task differences such as SNR), resulting in a narrower scale of percentage changes. Error bar and shaded area represented 1 standard error from the mean.

Effect of sentence duration

The longer the sentence, the more decoding must take place, and as a result, the pupil response may increase and be delayed incrementally as a function of sentence complexity [42]. To simplify this problem, here, we only looked at their duration (i.e., making a debatable assumption that a longer sentence necessarily contained more complex content). We grouped all materials into five duration bins, with centers corresponding to 1660, 2060, 2280, 2460, and 2700 ms. These values were extracted from the 10th, 30th, 50th, 70th, and 90th percentile of the distribution of all sentence materials. Adding this variable as a fixed factor in the aforementioned LME model, we found a main effect of duration on PPD amplitude [χ2(1) = 6.3, p = 0.012], but no interaction with SNR2(3) = 5.2, p = 0.155]. The main effect of duration did not reach significance for PPD latency [χ2(1) = 2.5, p = 0.110], and there was no interaction with SNR2(3) = 5.7, p = 0.126]. Note that this effect of duration on latency was certainly present when traces were aligned relative to sentence onset [χ2(1) = 14.9, p<0.001] (the interaction with SNR remaining absent [χ2(3) = 5.3, p = 0.144]). This suggests that the effect of sentence duration on latency is largely taken care of by aligning the responses from the sentence offset (and is a good reason why one may want to follow this recommendation–see comparison of top and bottom panels in Fig 6). Exp.1 thus confirmed that PPD is generally larger with longer (and likely more complex) sentences, disregarding SNR. The question arose as to whether the same could be said of bright and dark luminance settings.

Fig 6. Pupil metrics across different sentence durations in Experiment 1.

Fig 6

Baseline-corrected pupil traces measured in Exp.1 across different duration bins, aligned from the sentence onset (top) or from the response prompt (bottom). Longer sentences elicited larger PPDs, and whether this was accompanied by later PPDs depended on time alignment: The delayed PPD with longer stimuli is largely cancelled when sentences are aligned by their offset. Shaded area represented 1 standard error from the mean.

In Exp.2, the sentence lists used were slightly different from Exp.1, so the five duration bins had centers at 1680, 2040, 2280, 2500, and 2760 ms, once again extracted from the 10th, 30th, 50th, 70th, and 90th percentile of the respective materials. The factor duration did not lead to a main effect on PPD amplitude [χ2(1) = 0.3, p = 0.578], and there was no interaction with SNR2(1)<0.1, p = 0.757], luminance2(2) = 1.3, p = 0.526], or in a 3-way [χ2(2) = 2.3, p = 0.317]. For PPD latency, there was a main effect of duration2(1) = 5.6, p = 0.018], interacting with SNR2(1) = 5.6, p = 0.018], but not with luminance2(2) = 1.0, p = 0.594], or in a 3-way [χ2(2) = 0.1, p = 0.946]. These results were not straightforward to interpret. On one hand, luminance never interacted in these analyses, suggesting that it would not play any role. On the other hand, this analysis failed to replicate larger PPDs for longer sentences and instead found dubious effects on latency. Fig 7 shows the averaged traces for each duration bin, aligned by sentence onset (top) or sentence offset exactly 2 seconds before the prompt to respond verbally (bottom). No direct relationship was apparent between the size of the PPD and the sentence duration (as it was in Fig 6). As for its latency, any effect of duration was minimal (including at each SNR separately, not plotted here). Note that the effect of duration on latency was present when traces were aligned from sentence onset [χ2(1) = 22.3, p<0.001], as in Exp.1, without interaction with SNR2(1) = 3.8, p = 0.051], luminance [χ2(2) = 0.2, p = 0.918], or in a 3-way [χ2(2) = 0.8, p = 0.679]. So, the PPD did occur later with longer sentences at least with respect to sentence onset, but it remains puzzling that PPD amplitude did not follow the expected trend. One explanation is that the dynamics of the pupils recorded in dark or bright luminance tended to be less stereotypical (in addition to being of smaller magnitude overall after aggregation across trials) than in medium luminance; and for this reason, they did not react to sentence duration as systematically as they would have in medium luminance. This interpretation would point to an interaction between duration and luminance, but perhaps this interaction is subtle to obtain and would require more power (than the 40 sentences per luminance level used here). The striking contrast between Figs 6 and 7 makes us conclude that dark or bright luminance settings are not only less-than-ideal in terms of PPD amplitude, they also make the traces less dependent on sentence duration.

Fig 7. Pupil metrics across different sentence durations in Experiment 2.

Fig 7

Baseline-corrected pupil traces measured in Exp.2 across different duration bins, aligned from the sentence onset (top) or from the response prompt (bottom). Longer sentences elicited larger PPDs, and whether this was accompanied by later PPDs depended on time alignment: The delayed PPD with longer stimuli is largely compensated for when sentences are aligned by their offset. Shaded area represented 1 standard error from the mean.

Conclusion

Our results raised an under-examined but crucial issue when designing studies using pupillometry, particularly in the context of speech communication. Although previous studies and Exp1 confirmed that pupillary response is a reliable measure of listening effort in simple tasks, Exp2 suggested that the luminance of the experimental setup could affect the magnitude (and possibly the significance level) of the observed task-evoked pupillary response, due to the overriding ANS impact on the pupillary muscle system. We could not map out the entire relation between luminance and PPD, due to 1) the long time required for allowing participants to adjust to many new luminance levels at different SNRs, and 2) physical constraints of our lab (i.e., luminance level could not exceed 220 lux). But the results of Exp2 suggest a main effect of luminance on PPD and possibly an inverted U-shape relation between luminance condition and PPD, which is sufficient to raise concerns on the validity and reliability of pupillometry studies: effects observed in pupillary response are likely to be confounded by the luminance level of the experimental setting. Although most pupillometry studies report luminance level, there are inconsistencies in the method (e.g. measured close to the screen, directly at the participant’s eyes, or as ambient light in the room) and inconsistencies in the unit reported (lux vs cd/m2). Also, SNR manipulation was used in Exp2 because the impact of SNR on pupillary response was well validated in past studies. But other cognitive aspects and task difficulty have been examined using pupillometry, for instance working memory capacity, spectral resolution, divided attention, background noise type etc [3,43,44], and it is very likely that these effect sizes also depend on luminance level. With pupillometry being more and more popular, this fundamental question is increasingly important to address before the technique further expands into clinical applications.

Unfortunately, as suggested in Książek et al., [23] and our Additional analyses, there might not be an effective method to correct for the confound of luminance at the post-hoc analytical stage, due to the convoluted impact of ANS and central cognitive processing on the pupillary response. PPD latency was here relatively robust to luminance difference, but PPD latency is generally not as sensitive and as widely used in pupillometry studies compared to PPD amplitude. Certainly, we have not exhausted all the possible analytical measures, but this challenge further highlights the importance to understand and control for luminance at the planning and execution stages. A guideline (similar to [33,37]) might be useful for the research community to standardize the execution and report of luminance that could potentially bias the observed effect size of factors of interest. Our results add a cautious note to future clinical applications of pupillometry. Consistent with past studies and guidelines, we do recommend medium luminance based on our findings, but uniform luminance setting might not be realistic for all clinics. Therefore, to ensure the validity and comparability of pupillometry studies across clinics, a system with integrated and better luminance control might be preferable. For instance, virtual reality system with enclosed eyetracker and highly controlled visual field could be a standardised model for distribution.

To summarize, pupillometry remains a powerful tool to reveal the hidden cost of speech communication. To further apply this tool in clinical settings, we need strict examinations of factors that can affect task-evoked pupillary response, in order to enhance the validity and generalizability of pupillometry in cognitive hearing and clinical research.

Supporting information

S1 Appendix. Results using trial-based approach for calculating PPD.

(DOCX)

S2 Appendix. Comparing pupil responses for correctly and incorrectly repeated sentences.

(DOCX)

S1 File

(7Z)

Acknowledgments

We are grateful to all participants for their time and effort. We are also grateful to the constructive feedback from our reviewers and academic editor.

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

This research was supported by a grant from the Quebec governmet (Mitacs Accelerate https://www.mitacs.ca/en/programs/accelerate) in collaboration with an industrial partner Oticon Medical Canada (https://www.oticonmedical.com/) [grant number IT10517]. The funding was issued to Dr. Alexandre Lehmann and Dr. Mickael Deroche, for the postdoctoral work of Dr. Yue Zhang. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Beatty J. Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological bulletin. 1982. Mar;91(2):276. [PubMed] [Google Scholar]
  • 2.Granholm E, Asarnow RF, Sarkin AJ, Dykes KL. Pupillary responses index cognitive resource limitations. Psychophysiology. 1996. Jul;33(4):457–61. doi: 10.1111/j.1469-8986.1996.tb01071.x [DOI] [PubMed] [Google Scholar]
  • 3.Zekveld AA, Kramer SE. Cognitive processing load across a wide range of listening conditions: Insights from pupillometry. Psychophysiology. 2014. Mar;51(3):277–84. doi: 10.1111/psyp.12151 [DOI] [PubMed] [Google Scholar]
  • 4.Winn MB. Rapid release from listening effort resulting from semantic context, and effects of spectral degradation and cochlear implants. Trends in Hearing. 2016. Sep;20:2331216516669723. doi: 10.1177/2331216516669723 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Zekveld AA, Kramer SE, Festen JM. Cognitive load during speech perception in noise: The influence of age, hearing loss, and cognition on the pupil response. Ear and hearing. 2011. Jul 1;32(4):498–510. doi: 10.1097/AUD.0b013e31820512bb [DOI] [PubMed] [Google Scholar]
  • 6.Koelewijn T, Zekveld AA, Festen JM, Kramer SE. Pupil dilation uncovers extra listening effort in the presence of a single-talker masker. Ear and Hearing. 2012. Mar 1;33(2):291–300. doi: 10.1097/AUD.0b013e3182310019 [DOI] [PubMed] [Google Scholar]
  • 7.Koelewijn T, Shinn-Cunningham BG, Zekveld AA, Kramer SE. The pupil response is sensitive to divided attention during speech processing. Hearing research. 2014. Jun 1;312:114–20. doi: 10.1016/j.heares.2014.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ohlenforst B, Zekveld AA, Lunner T, Wendt D, Naylor G, Wang Y, et al. Impact of stimulus-related factors and hearing impairment on listening effort as indicated by pupil dilation. Hearing Research. 2017. Aug 1;351:68–79. doi: 10.1016/j.heares.2017.05.012 [DOI] [PubMed] [Google Scholar]
  • 9.Ohlenforst B, Wendt D, Kramer SE, Naylor G, Zekveld AA, Lunner T. Impact of SNR, masker type and noise reduction processing on sentence recognition performance and listening effort as indicated by the pupil dilation response. Hearing research. 2018. Aug 1;365:90–9. doi: 10.1016/j.heares.2018.05.003 [DOI] [PubMed] [Google Scholar]
  • 10.Wendt D, Hietkamp RK, Lunner T. Impact of noise and noise reduction on processing effort: A pupillometry study. Ear and hearing. 2017. Nov 1;38(6):690–700. doi: 10.1097/AUD.0000000000000454 [DOI] [PubMed] [Google Scholar]
  • 11.Tryon WW. Pupillometry: A survey of sources of variation. Psychophysiology. 1975. Jan;12(1):90–3. doi: 10.1111/j.1469-8986.1975.tb03068.x [DOI] [PubMed] [Google Scholar]
  • 12.Beatty J, Lucero-Wagoner B. The pupillary system. Handbook of psychophysiology. 2000;2(142–162). [Google Scholar]
  • 13.Loewenfeld IE, Lowenstein O. The pupil: Anatomy, physiology, and clinical applications. Butterworth-Heinemann; 1993. [Google Scholar]
  • 14.Wang Y, Zekveld AA, Naylor G, Ohlenforst B, Jansma EP, Lorens A, et al. Parasympathetic nervous system dysfunction, as identified by pupil light reflex, and its possible connection to hearing impairment. PloS one. 2016. Apr 18;11(4):e0153566. doi: 10.1371/journal.pone.0153566 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Winn M, Edwards J. The impact of spectral resolution on listening effort revealed by pupil dilation. The Journal of the Acoustical Society of America. 2013. Nov;134(5):4233. [Google Scholar]
  • 16.Xu J, Wang Y, Chen F, Choi E. Pupillary response based cognitive workload measurement under luminance changes. InIFIP Conference on Human-Computer Interaction 2011. Sep 5 (pp. 178–185). Springer, Berlin, Heidelberg. [Google Scholar]
  • 17.Peysakhovich V, Causse M, Scannella S, Dehais F. Frequency analysis of a task-evoked pupillary response: Luminance-independent measure of mental effort. International Journal of Psychophysiology. 2015. Jul 1;97(1):30–7. doi: 10.1016/j.ijpsycho.2015.04.019 [DOI] [PubMed] [Google Scholar]
  • 18.Zhang Y, Lehmann A, Deroche M. Disentangling listening effort and memory load beyond behavioural evidence: Pupillary response to listening effort during a concurrent memory task. PloS one. 2021. Mar 3;16(3):e0233251. doi: 10.1371/journal.pone.0233251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Van der Stoep N, Van der Smagt MJ, Notaro C, Spock Z, Naber M. The additive nature of the human multisensory evoked pupil response. Scientific reports. 2021. Jan 12;11(1):1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Steinhauer SR, Siegle GJ, Condray R, Pless M. Sympathetic and parasympathetic innervation of pupillary dilation during sustained processing. International journal of psychophysiology. 2004. Mar 1;52(1):77–86. doi: 10.1016/j.ijpsycho.2003.12.005 [DOI] [PubMed] [Google Scholar]
  • 21.Peysakhovich V, Vachon F, Dehais F. The impact of luminance on tonic and phasic pupillary responses to sustained cognitive load. International Journal of Psychophysiology. 2017. Feb 1;112:40–5. doi: 10.1016/j.ijpsycho.2016.12.003 [DOI] [PubMed] [Google Scholar]
  • 22.Pan J, Klímová M, McGuire JT, Ling S. Arousal-based pupil modulation is dictated by luminance. Scientific reports. 2022. Jan 26;12(1):1–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Książek P, Zekveld AA, Wendt D, Fiedler L, Lunner T, Kramer SE. Effect of Speech-to-Noise Ratio and Luminance on a Range of Current and Potential Pupil Response Measures to Assess Listening Effort. Trends in hearing. 2021. Apr;25:23312165211009351. doi: 10.1177/23312165211009351 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wang Y, Zekveld AA, Wendt D, Lunner T, Naylor G, Kramer SE. Pupil light reflex evoked by light-emitting diode and computer screen: Methodology and association with need for recovery in daily life. PLoS One. 2018. Jun 13;13(6):e0197739. doi: 10.1371/journal.pone.0197739 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Micula A, Rönnberg J, Fiedler L, Wendt D, Jørgensen MC, Larsen DK, et al. The effects of task difficulty predictability and noise reduction on recall performance and pupil dilation responses. Ear and hearing. 2021. Nov;42(6):1668. doi: 10.1097/AUD.0000000000001053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rothauser EH. IEEE recommended practice for speech quality measurements. IEEE Trans. on Audio and Electroacoustics. 1969;17:225–46. [Google Scholar]
  • 27.Vaillancourt V, Laroche C, Mayer C, Basque C, Nali M, Eriks-Brophy A, et al. Adaptation of the hint (hearing in noise test) for adult canadian francophone populations: Adaptación del hint (prueba de audición en ruido) para poblaciones de adultos canadienses francófonos. International Journal of Audiology. 2005. Jan 1;44(6):358–61. [DOI] [PubMed] [Google Scholar]
  • 28.Mathôt S. Pupillometry: Psychology, physiology, and function. Journal of Cognition. 2018;1(1). doi: 10.5334/joc.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jaeger TF. Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of memory and language. 2008. Nov 1;59(4):434–46. doi: 10.1016/j.jml.2007.11.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Mirman D. Growth curve analysis and visualization using R. Chapman and Hall/CRC; 2017. Sep 7. [Google Scholar]
  • 31.Burg EA, Thakkar T, Fields T, Misurelli SM, Kuchinsky SE, Roche J, et al. Systematic comparison of trial exclusion criteria for pupillometry data analysis in individuals with single-sided deafness and normal hearing. Trends in Hearing. 2021. May;25:23312165211013256. doi: 10.1177/23312165211013256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Klingner J, Kumar R, Hanrahan P. Measuring the task-evoked pupillary response with a remote eye tracker. InProceedings of the 2008 symposium on Eye tracking research & applications 2008. Mar 26 (pp. 69–72). [Google Scholar]
  • 33.Winn MB, Wendt D, Koelewijn T, Kuchinsky SE. Best practices and advice for using pupillometry to measure listening effort: An introduction for those who want to get started. Trends in hearing. 2018. Sep;22:2331216518800869. doi: 10.1177/2331216518800869 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.McDougal DH, Gamlin PD. Autonomic control of the eye. Comprehensive physiology. 2015. Jan;5(1):439. doi: 10.1002/cphy.c140014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Slade K, Kramer SE, Fairclough S, Richter M. Effortful listening: Sympathetic activity varies as a function of listening demand but parasympathetic activity does not. Hearing Research. 2021. Oct 1;410:108348. doi: 10.1016/j.heares.2021.108348 [DOI] [PubMed] [Google Scholar]
  • 36.Reilly J, Kelly A, Kim SH, Jett S, Zuckerman B. The human task-evoked pupillary response function is linear: Implications for baseline response scaling in pupillometry. Behavior research methods. 2019. Apr;51(2):865–78. doi: 10.3758/s13428-018-1134-4 [DOI] [PubMed] [Google Scholar]
  • 37.Mathôt S, Fabius J, Van Heusden E, Van der Stigchel S. Safe and sensible preprocessing and baseline correction of pupil-size data. Behavior research methods. 2018. Feb;50(1):94–106. doi: 10.3758/s13428-017-1007-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hess EH, Polt JM. Pupil size in relation to mental activity during simple problem-solving. Science. 1964. Mar 13;143(3611):1190–2. doi: 10.1126/science.143.3611.1190 [DOI] [PubMed] [Google Scholar]
  • 39.Johnson EL, Miller Singley AT, Peckham AD, Johnson SL, Bunge SA. Task-evoked pupillometry provides a window into the development of short-term memory capacity. Frontiers in psychology. 2014. Mar 13;5:218. doi: 10.3389/fpsyg.2014.00218 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Piquado T, Isaacowitz D, Wingfield A. Pupillometry as a measure of cognitive effort in younger and older adults. Psychophysiology. 2010. May;47(3):560–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Alhanbali S, Munro KJ, Dawes P, Carolan PJ, Millman RE. Dimensions of self-reported listening effort and fatigue on a digits-in-noise task, and association with baseline pupil size and performance accuracy. International Journal of Audiology. 2021. Oct 1;60(10):762–72. doi: 10.1080/14992027.2020.1853262 [DOI] [PubMed] [Google Scholar]
  • 42.Winn MB, Moore AN. Pupillometry reveals that context benefit in speech perception can be disrupted by later-occurring sounds, especially in listeners with cochlear implants. Trends in Hearing. 2018. Oct;22:2331216518808962. doi: 10.1177/2331216518808962 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Winn MB, Edwards JR, Litovsky RY. The impact of auditory spectral resolution on listening effort revealed by pupil dilation. Ear and hearing. 2015. Jul;36(4):e153. doi: 10.1097/AUD.0000000000000145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zekveld AA, Koelewijn T, Kramer SE. The pupil dilation response to auditory stimuli: Current state of knowledge. Trends in hearing. 2018. Sep;22:2331216518777174. doi: 10.1177/2331216518777174 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Sebastiaan Mathôt

18 May 2022

PONE-D-22-07841Luminance effects on pupil dilation in speech-in-noise recognitionPLOS ONE

Dear Dr. Zhang, Thank you for submitting your manuscript to PLOS ONE. I have received two reviews; one reviewer prefers to remain anonymous; the other is Jamie Reilly. I have also read the manuscript myself. As you will see, the reviewers are generally positive about your work, but feel that there should be a more thorough discussion of previous work on the same topic. I agree with their assessment. Right now, you focus mostly on Steinhauer's 2004 paper, but this line of research actually goes back to the 60s, and both reviewers provide several relevant references. Therefore, I invite you to address this concern, as well as the other points raised by the reviewers, in a revision.

Please submit your revised manuscript by Jul 02 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Sebastiaan Mathôt, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please change "female” or "male" to "woman” or "man"" as appropriate, when used as a noun (see for instance https://apastyle.apa.org/style-grammar-guidelines/bias-free-language/gender).

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“This research was supported by a grant from the Quebec governmet (Mitacs Accelerate) in 607 collaboration with an industrial partner Oticon Medical Canada [grant number IT10517]. We are 608 grateful to all participants for their time and effort.”

We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“This research was supported by a grant from the Quebec governmet (Mitacs Accelerate https://www.mitacs.ca/en/programs/accelerate) in collaboration with an industrial partner Oticon Medical Canada (https://www.oticonmedical.com/) [grant number IT10517]. The funding was issued to Dr. Alexandre Lehmann and Dr. Mickael Deroche, for the postdoctoral work of Dr. Yue Zhang.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

5. Please upload a new copy of Figure 2, 4, 3, 5, 6, 7 and 8 as the detail is not clear. Please follow the link for more information: https://blogs.plos.org/plos/2019/06/looking-good-tips-for-creating-your-plos-figures-graphics/

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This manuscript describes a study to the effect of luminance and SNR on the peak pupil response dilation. Although the study methods are sound, the embedding of the study in the context of existing study can be improved substantially. Please find my suggestions below.

1. Embedding in existing literature. The authors indicate that the study described in the manuscript is one of few studies addressing the combined influence of luminance and (auditory) task load on the pupil response (e.g. line 562). However, the authors did not include a rigorous description of current work. Specifically, the authors should include the studies cited below (and probably more relevant studies can easily be found) to provide a more comprehensive overview of the current state of knowledge. The current studies up to date suggest that the current results are inconclusive, namely some studies have found evidence for an effect of luminance on the PPD evoked by cognitive tasks, and some did not show this effect. The authors could expand the discussion about the factors that may be associated with this current inconclusive evidence.

2. Currently, most published “guidelines” on pupillometry recommend to use a medium illumination level. How would the current results affect this recommendation? Importantly, the authors report a main effect of luminance, and the post-hoc analysis shows that the pupil response in bright conditions differs from that in medium level conditions. However, the difference between the dark and medium level condition is not significant. Regardless, the remaining part of the manuscript treats this absent effect as if this difference was indeed significant (e.g. line 395). Please adapt (why would you test the main effect if you don’t follow up on the results appropriately?).

3. Related to the incomplete account of the current literature, the authors do not explain and discuss the potential physiological mechanisms that may account for (part of) the effects. Specifically, although Steinhauer’s work is cited, they do not discuss the notion that the pupil response in darkness might be affected by a limited effect of the PNS on the pupil size in dark. Please include a discussion on the separate roles of PNS and SNS in the pupil response (and the effect of illumination)

4. The authors suggest that luminance effects on the pupil response to cognitive load may be a negative aspect, and that it may “hinder” (e.g., line 82, line 120) the pupillometry applications. Rather, such effects can be exploited in case these are meaningful (e.g. Steinhauers work). Please discuss the potential benefits of such approaches.

5. Line 68: a larger effect does not automatically indicate a predominant effect. Please rephrase.

6. Line 136 and line 386: treating the baseline as being associated with the “tonic LC state” is probably not appropriate in the current study, as the baseline was derived just before the actual sentence presentation. The tonic LC state is observed during rest. The participants are not passively listening but probably actively anticipating the target speech. See also Joshi, S., & Gold, J. I. (2020). Pupil size as a window on neural substrates of cognition. Trends in cognitive sciences, 24(6), 466-480.

7. Please provide the rationale and hypotheses for the (additional) analyses performed. Please also provide the rationale for the specific analyses performed on the behavioural data. Are these analyses robust against the fact that the behavioural performances were at ceiling levels?

8. The sentence duration needs to be provided in the main methods section. How is the varying sentence length accounted for with respect to the interval in which the pupil data were analysed? Was the interval adapted relative to the length of the shortest sentence? The additional analyses do not support this.

9. In many studies, the dynamic range of the pupil size that is used in the proportional baseline correction is specifically and separately measured (not during the actual task). This differs from the current approach. Could this have affected the results?

10. The resolution of the figures is poor, also when downloading the images directly from the portal.

11. The authors already interpret the data in the results section (e.g. lines 374-375). In addition, the authors could use more formal language throughout the paper (e.g. line 566: what is meant with “gripping”?).

12. Line 409: serperation typo

13. Lines 407-409: this is speculative; please indicate that this is not based on the current evidence.

Pan, J., Klímová, M., McGuire, J.T. et al. Arousal-based pupil modulation is dictated by luminance. Sci Rep 12, 1390 (2022). https://doi.org/10.1038/s41598-022-05280-1

Van der Stoep, N., Van der Smagt, M.J., Notaro, C. et al. The additive nature of the human multisensory evoked pupil response. Sci Rep 11, 707 (2021). https://doi.org/10.1038/s41598-020-80286-1

Madsen, J., Julio, S. U., Gucik, P. J., Steinberg, R., & Parra, L. C. (2021). Synchronized eye movements predict test scores in online video education. Proceedings of the National Academy of Sciences, 118(5).

Reilly, J., Kelly, A., Kim, S. H., Jett, S., & Zuckerman, B. (2019). The human task-evoked pupillary response function is linear: Implications for baseline response scaling in pupillometry. Behavior research methods, 51(2), 865-878.

Reviewer #2: This well-written and interesting work highlights a question of methodological significance in pupillometry (i.e., how should we account for variable lighting conditions?). The study was conducted with admirable rigor, and the data will contribute to our evolving understanding of how to optimize pupillometry and control for the all-important factor of luminance. This reviewer had several questions that the authors might wish to consider prior to this work:

One of the central claims of this work is that little is known about the interaction between task difficulty and luminance that might inform best practices in baseline correction. I would argue on this point that this is not entirely correct. The authors have missed several studies examining the effects of luminance perturbations on the magnitude of the task-evoked pupillary response function (Pan et al., 2022; Reilly et al., 2018). In addition, there are other past studies uncited (e.g., Bradhshaw, 1969) that involved similar manipulations from a half century ago. It is reasonably well-accepted that linear (subtraction) is the most biologically plausible form of baseline pupil correction (as referenced by Mathot et al). It might be useful to make contact with some of the following sources on the effects of luminance:

Bradshaw, J. L. (1969). Background light intensity and the pupillary response in a reaction time task. Psychonomic Science, 14(6), 271–272. https://doi.org/10.3758/BF03329118

Pan, J., Klímová, M., McGuire, J. T., & Ling, S. (2022). Arousal-based pupil modulation is dictated by luminance. Scientific Reports, 12(1), 1390. https://doi.org/10.1038/s41598-022-05280-1

Reilly, J., Kelly, A., Kim, S. H., Jett, S., & Zuckerman, B. (2018). The human task-evoked pupillary response function is linear: Implications for baseline response scaling in pupillometry. Behavior Research Methods, 1–14.

Pg. 8, I do not know what ‘IEEE sentences’ are– consider writing out acronyms and providing accompanying references

Pg. 8, I do not know what ‘HINT sentences’ are – consider writing out acronyms and providing accompanying references

Packages such as 'GazeR' implement velocity-based blink detection and interpolation procedures with the idea that observations corresponding to partial eyelid closure are part of the blink. It is not clear how/why you implemented your particular choice of blink correction algorithms. Consider elaborating

It is not clear how you were able to obtain reliable pupil readings for the low (0 Lux) light condition. In our own past studies using luminance manipulations, we struggled to acquire reliable data in ‘pure’ dark even using an IR sensor with an Eyelink 1000. The eyetracker simply could not reliably discriminate pupil from iris in complete darkness. How did you shield the room from monitor luminance, etc. to achieve 0 Lux luminance?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Jamie Reilly

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Dec 2;17(12):e0278506. doi: 10.1371/journal.pone.0278506.r002

Author response to Decision Letter 0


19 Aug 2022

Reviewer #1: This manuscript describes a study to the effect of luminance and SNR on the peak pupil response dilation. Although the study methods are sound, the embedding of the study in the context of existing study can be improved substantially. Please find my suggestions below.

1. Embedding in existing literature. The authors indicate that the study described in the manuscript is one of few studies addressing the combined influence of luminance and (auditory) task load on the pupil response (e.g. line 562). However, the authors did not include a rigorous description of current work. Specifically, the authors should include the studies cited below (and probably more relevant studies can easily be found) to provide a more comprehensive overview of the current state of knowledge. The current studies up to date suggest that the current results are inconclusive, namely some studies have found evidence for an effect of luminance on the PPD evoked by cognitive tasks, and some did not show this effect. The authors could expand the discussion about the factors that may be associated with this current inconclusive evidence.

We thank our reviewers and academic editor to point us to all the relevant literature. We realized that many papers suggested are very helpful to improve the overall quality of our paper. We have included these studies in the introduction and discussion to improve the review of past studies (line88) and possible reasons that lead to inconsistencies in the existing studies (line114-122, line411 -430, line440).

2. Currently, most published “guidelines” on pupillometry recommend to use a medium illumination level. How would the current results affect this recommendation? Importantly, the authors report a main effect of luminance, and the post-hoc analysis shows that the pupil response in bright conditions differs from that in medium level conditions. However, the difference between the dark and medium level condition is not significant. Regardless, the remaining part of the manuscript treats this absent effect as if this difference was indeed significant (e.g. line 395). Please adapt (why would you test the main effect if you don’t follow up on the results appropriately?).

We agree with the reviewer that the main effect of luminance is not thoroughly followed up, therefore, we added extensive discussion on the results and why some results are different from past studies (i.e., medium luminance is bigger than both dark and bright) (line411-430). From our current results, in practice medium-luminance is still recommended (line591), and we added a note on how this might not be realistic in clinical settings with current setups and how this could be improved.

3. Related to the incomplete account of the current literature, the authors do not explain and discuss the potential physiological mechanisms that may account for (part of) the effects. Specifically, although Steinhauer’s work is cited, they do not discuss the notion that the pupil response in darkness might be affected by a limited effect of the PNS on the pupil size in dark. Please include a discussion on the separate roles of PNS and SNS in the pupil response (and the effect of illumination)

Indeed, we had not discussed the ANS mechanism sufficiently. We added a section in discussion to better explain/interpret our results (line411-440).

4. The authors suggest that luminance effects on the pupil response to cognitive load may be a negative aspect, and that it may “hinder” (e.g., line 82, line 120) the pupillometry applications. Rather, such effects can be exploited in case these are meaningful (e.g. Steinhauers work). Please discuss the potential benefits of such approaches.

Indeed, the complex relation between SNS and PNS can be experimentally controlled in a meaningful way (e.g., Steinhauer et al., 2004; Reilly et al., 2019) to answer interesting research questions. We have changed the tone when addressing the impact of luminance to be more neutral and added a note on how we can use the knowledge to benefit scientific research (line430). But we still keep and emphasize the central message of our paper that in the case of standardizing clinical pupillometry, factors like luminance and individual differences should be properly examined and controlled across clinics, in order to extract the pupillary responses to clinical stimuli of interest (i.e., standardized test words or sentences, pure tone or FM/AM tone for audiometry, visual stimuli etc) (line580). We think this message is more important (and also much-needed) for the field of cognitive hearing science.

5. Line 68: a larger effect does not automatically indicate a predominant effect. Please rephrase.

Thanks, we have rephrased ‘predominant’ to ‘having a larger impact’

6. Line 136 and line 386: treating the baseline as being associated with the “tonic LC state” is probably not appropriate in the current study, as the baseline was derived just before the actual sentence presentation. The tonic LC state is observed during rest. The participants are not passively listening but probably actively anticipating the target speech. See also Joshi, S., & Gold, J. I. (2020). Pupil size as a window on neural substrates of cognition. Trends in cognitive sciences, 24(6), 466-480.

Absolutely, thank you for pointing this out to us. We have removed the tonic/phasic state, and clarified that our baseline is not resting state baseline (line472).

7. Please provide the rationale and hypotheses for the (additional) analyses performed. Please also provide the rationale for the specific analyses performed on the behavioural data. Are these analyses robust against the fact that the behavioural performances were at ceiling levels?

We have added a section head to better explain our rationale for performing additional analyses (line 451). We also pointed more often to the additional analyses in the method and discussion section to support the value of the additional analyses (line 265 line 274, line410, line581). We believe that this will make the additional analyses section more relevant and will tie better with our central message. Behavioural results were indeed close to ceiling, as intended, due to participants being normally hearing and relatively young. The logistic mixed effect model does take into account such upper asymptote in performance and address the contrast better than applying transformations like arcsine-square-root (line212).

8. The sentence duration needs to be provided in the main methods section. How is the varying sentence length accounted for with respect to the interval in which the pupil data were analysed? Was the interval adapted relative to the length of the shortest sentence? The additional analyses do not support this.

The sentences remained unchanged and the analytical window contained sentence duration plus 2s of fixed waiting interval starting at the offset of the sentence for the pupil peak to emerge. We added further clarification to this point in line270. The uncorrected sentence duration is common in past listening effort studies because standardized sentences are almost always of slight duration difference (Winn et al., 2018, page 19). We also reported the 10th, 30th, 50th, 70th and 90th percentile of the distribution of all sentences (line 509). This distribution could give a better indication of the sentence duration. This duration variability does not affect averaged pupil response, as long as the time alignment of traces is done consistently (either by the onset or the offset of the sentence). We reported in the additional analyses the results of doing both types of alignment to ensure readers of our methods and to explore whether different alignment would change results significantly. A formal analysis was also performed to examine systematically whether variability in sentence duration, albeit small, has any impact on calculating task-evoked pupillary response.

9. In many studies, the dynamic range of the pupil size that is used in the proportional baseline correction is specifically and separately measured (not during the actual task). This differs from the current approach. Could this have affected the results?

We agree that resting state baseline would change the dynamic range calculation and baseline-corrected pupil response. And such baseline could also be affected by luminance. Considering the arousal that would take place when getting ready for a task, we further speculate that the resting state baseline would be lower than that we considered (prior to a given block, or across a whole experiment). In other words, the dynamic range we showed is presumably underestimated, compared to a method which would take the resting-state baseline instead. But note that other phenomena could also affect resting baseline other than task readiness. In our protocol, we did not include resting state pupil recordings, therefore, it is difficult for us to make any informed claim in this regard. We clarified that our baseline is not a resting state baseline and that we could get different results if we had done so. (line472).

10. The resolution of the figures is poor, also when downloading the images directly from the portal.

We have addressed this issue and uploaded images of correct resolution.

11. The authors already interpret the data in the results section (e.g. lines 374-375). In addition, the authors could use more formal language throughout the paper (e.g. line 566: what is meant with “gripping”?).

We have removed the data interpretation in the results section, except for one short sentence where we wish to highlight the key finding of the paper (line384). Also, we have improved the formality of the language to avoid using ‘gripping’ or other ambiguous words.

12. Line 409: serperation typo

The typo is now corrected.

13. Lines 407-409: this is speculative; please indicate that this is not based on the current evidence.

We agree; it is indeed a speculation based on the shape we obtained from the three luminance levels tested. We added possible support from the past literature (line 440), but neither of them provides direct support for this speculation, due to different test materials (Pan et al., 2022) and physiological biomarkers used (Slade et al., 2021). We have rephrased the sentence to highlight that this was only a speculation and more studies are needed to consolidate it.

Reviewer #2: This well-written and interesting work highlights a question of methodological significance in pupillometry (i.e., how should we account for variable lighting conditions?). The study was conducted with admirable rigor, and the data will contribute to our evolving understanding of how to optimize pupillometry and control for the all-important factor of luminance.

Thank you for your kind words. We’re glad you found it interesting.

This reviewer had several questions that the authors might wish to consider prior to this work:

One of the central claims of this work is that little is known about the interaction between task difficulty and luminance that might inform best practices in baseline correction. I would argue on this point that this is not entirely correct. The authors have missed several studies examining the effects of luminance perturbations on the magnitude of the task-evoked pupillary response function (Pan et al., 2022; Reilly et al., 2018). In addition, there are other past studies uncited (e.g., Bradhshaw, 1969) that involved similar manipulations from a half century ago. It is reasonably well-accepted that linear (subtraction) is the most biologically plausible form of baseline pupil correction (as referenced by Mathot et al). It might be useful to make contact with some of the following sources on the effects of luminance:

Bradshaw, J. L. (1969). Background light intensity and the pupillary response in a reaction time task. Psychonomic Science, 14(6), 271 -272. https://doi.org/10.3758/BF03329118

Pan, J., Klímová, M., McGuire, J. T., & Ling, S. (2022). Arousal-based pupil modulation is dictated by luminance. Scientific Reports, 12(1), 1390. https://doi.org/10.1038/s41598-022-05280-1

Reilly, J., Kelly, A., Kim, S. H., Jett, S., & Zuckerman, B. (2018). The human task-evoked pupillary response function is linear: Implications for baseline response scaling in pupillometry. Behavior Research Methods, 1–14.

We are incredibly grateful for this insight and the many suggestions on the literature. We have integrated these papers in the introduction and discussion to improve the coverage of past studies (line88) and possible reasons that could have led to inconsistencies in the existing studies (line114-122, line411 -430, line440).

Pg. 8, I do not know what ‘IEEE sentences’ are– consider writing out acronyms and providing accompanying references

Pg. 8, I do not know what ‘HINT sentences’ are – consider writing out acronyms and providing accompanying references

Thanks for pointing out the acronym issue. We have added the corpus name and the reference to the materials.

Packages such as 'GazeR' implement velocity-based blink detection and interpolation procedures with the idea that observations corresponding to partial eyelid closure are part of the blink. It is not clear how/why you implemented your particular choice of blink correction algorithms. Consider elaborating

Thank you for making us aware of these packages. At the time of the experimentation and data analysis, GazeR was not published (Apr 2020) so it did not come to our attention. Instead, we followed pupillometry guidelines provided in Winn et al., 2008 and Mathôt et al., 2018 when constructing pupil pre-processing pipeline. We reported in the Methods section the detailed steps and corresponding Matlab and R functions for analysis to promote reproducibility. Note that we also integrated a gaze-based exclusion (line233) rule, different from GazeR. We believe this addition could improve data quality, especially in settings like ours where participants looked at a very big screen and gaze wander-offs were common (easier than they might have been in another study with a typical laptop screen). Removing pupil measures with gaze too far away from the fixation cross can effectively control for the inattentive moments.

It is not clear how you were able to obtain reliable pupil readings for the low (0 Lux) light condition. In our own past studies using luminance manipulations, we struggled to acquire reliable data in ‘pure’ dark even using an IR sensor with an Eyelink 1000. The eyetracker simply could not reliably discriminate pupil from iris in complete darkness. How did you shield the room from monitor luminance, etc. to achieve 0 Lux luminance?

Most likely, this is because it wasn’t exactly 0 lux. Please have another read at the experimental room setup (line184), and you can find an actual photo of the setup below. The participants sat on a chair 2m away from a large screen (35-inch). In the dark condition, the room lights were turned off, and the screen luminance was set to the lowest for signaling participants. The room was indeed dark, but still with some dots of light emitted from the big screen. When we measured luminance using the luxometer TES-1335 at participants’ eye level 2m away from the screen, the reading was zero due to low screen light, big distance from the light source and possibly the sensitivity of the luxometer. But we acknowledge that we were not shielding all the light in the room, and agree with the reviewer that ‘complete darkness’ is not scientifically rigid. So, we have changed all occurrences of wording ‘complete darkness’ to ‘darkness’ or ‘close to 0lux’.

Attachment

Submitted filename: rebuttal_letter .docx

Decision Letter 1

Sebastiaan Mathôt

22 Sep 2022

PONE-D-22-07841R1Luminance effects on pupil dilation in speech-in-noise recognitionPLOS ONE

Dear Dr. Zhang,

Thank you for submitting your revised manuscript to PLOS ONE. Both of the original reviewers read the revision and you will happy to see that they are almost satisfied with the revision, although they both do raise a few final comments. I invite you to respond to these comments in a second revision. I do not anticipate sending the manuscript out for review again, but of course I reserve the right to do so if for some reason I feel that this is necessary. A small practical request from my side: In the previous revision, you kept several layers of track changes, which I imagine reflects the back-and-forths between the various authors. However, this is really confusing for the reviewers and myself! Could you indicate the changes in a simpler way, for example a single layer of track changes or by highlighting the relevant parts of the text?

Please submit your revised manuscript by Nov 06 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Sebastiaan Mathôt, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Review of PONE-D-22-07841_R1 “Luminance effects on pupil dilation in speech-in-noise recognition”.

The authors have improved the readability of the manuscript and better acknowledge the current state of literature in the introduction and discussion sections. I still have a few comments and questions:

Readability: The manuscript still contains quite a few typos and non-optimal phrases. I list a couple of them below, but there are probably more. Please carefully check the quality of the English text. Examples: line 46 “bigger and bigger”; line 71: this sentence doesn’t read fluently; line 75: “dark lighting” is a bit confusing, line 84: “it”: not clear what it refers to; line 105: “staining”, line 136: “they” should be there, line 140: “on” is missing, line 236: “as” should be and, line 256: add “rate”, line 296: this line refers to a result that hasn’t been presented yet, which is confusing; line 337: “our end goal” is not a clear phrase; line 339: sentence is not fluent, line 346: “disregarded” does not fit in the sentence; line 352: “massive”: informal language; which also holds for “little larger” in the next line.

Language: The authors describe that two types of sentences were used, either IEEE or HINT, depending on the native tongue of the participant. Please describe the characteristics of these sentence sets in more detail, as (differences in) complexity, sentence structure, sentence length etc. can all affect the performance and pupil response. This is especially relevant in relation to the analysis of the effect of sentence duration (was sentence set confounding, i.e. was the distribution of sentence duration the same for both sets?) and the determination of the analysis window. The analysis would benefit from an explicit comparison of the effect of sentence set / language to check whether it affected the dependent measures (even though the limited sample size in such a comparison probably reduces the chances of observing an effect). Line 299 and line 528: Please explain what is meant with “with different speech materials”? Did the listeners perceive another (third) set of sentences? Please describe the details.

Line 311: Why do the authors refer to the current task as a “non-demanding” task? Please explain.

Figure 2: to me, it is not clear why the pupil response is quiet, which seems relatively large in Figure 2 (plot showing the response over time) is actually relatively small in the plot showing mean PPD. What is causing the relative difference between the size of the effect between these plots? How do the authors explain the relatively large pupil response in the most easy condition?

Absent interaction between SNR and luminance: none of the many analyses found an interaction effect between SNR and luminance. However, the wording of some sentences is a bit confusing and seems to refer to such an interaction. For example: line 392: “interact”: please rephrase.

Discussion: The way in which additional analyses and results are reported in the discussion section is not usual and not introduced properly. Consider moving these sections to the results section.

Reviewer #2: The authors have addressed many of this reviewer's original concerns but might consider the following issues:

1) Effect size measurements should appear in conjunction with relevant statistics to promote replication.

2) It is unclear why the authors chose to implement proportional baseline scaling. This technique is not recommended as a baseline correction procedure as it can significantly distort the magnitude of evoked pupil dilation at high/low levels of the dynamic range. For example, an evoked change of .1mm from a 1mm baseline is a much higher %change relative to the same absolute change (.1mm) from a 9mm baseline. Several recent papers have addressed this issue along with recommendations for subtractive baseline scaling.

3) It is reasonably well established that at very high ends of the dynamic range of the pupil (e.g., intense light, absolute darkness), pupillary movements become idiosyncratic. In addition, eye trackers often become unreliable in these conditions because of challenges in contrasting pupil from iris. It was not clear what the overarching recommendations were here regarding control for luminance. In moderate ambient lighting conditions as most labs might encounter (e.g., fluorescent lighting) practical variability would not be too high (assuming testing in a windowless room).

4) I had difficulty finding links to a data repository to examine stimuli, etc.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Jamie Reilly

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Dec 2;17(12):e0278506. doi: 10.1371/journal.pone.0278506.r004

Author response to Decision Letter 1


6 Nov 2022

We thank the two reviewers and the academic editor for their constructive comments which helped us improve this manuscript. We have made several edits to further polish this submission.

We apologize for the double edits in the earlier revision and fixed it in this new version to avoid confusion.

Note that line numbers are referenced to the clean version of this revision (not with tracked changes).

Reviewer #1: Review of PONE-D-22-07841_R1 “Luminance effects on pupil dilation in speech-in-noise recognition”.

The authors have improved the readability of the manuscript and better acknowledge the current state of literature in the introduction and discussion sections. I still have a few comments and questions:

Readability: The manuscript still contains quite a few typos and non-optimal phrases. I list a couple of them below, but there are probably more. Please carefully check the quality of the English text. Examples: line 46 “bigger and bigger”; line 71: this sentence doesn’t read fluently; line 75: “dark lighting” is a bit confusing, line 84: “it”: not clear what it refers to; line 105: “staining”, line 136: “they” should be there, line 140: “on” is missing, line 236: “as” should be and, line 256: add “rate”, line 296: this line refers to a result that hasn’t been presented yet, which is confusing; line 337: “our end goal” is not a clear phrase; line 339: sentence is not fluent, line 346: “disregarded” does not fit in the sentence; line 352: “massive”: informal language; which also holds for “little larger” in the next line.

We thank the reviewer to point out those non-optimal wordings and typos. We have made corresponding changes and also additional checks for typos (line 34, line 46, line71, line84, line105, line140, line 236, line256, line 296, line334, line337, line346,line352, line355 ).

Language: The authors describe that two types of sentences were used, either IEEE or HINT, depending on the native tongue of the participant. Please describe the characteristics of these sentence sets in more detail, as (differences in) complexity, sentence structure, sentence length etc. can all affect the performance and pupil response. This is especially relevant in relation to the analysis of the effect of sentence duration (was sentence set confounding, i.e. was the distribution of sentence duration the same for both sets?) and the determination of the analysis window. The analysis would benefit from an explicit comparison of the effect of sentence set / language to check whether it affected the dependent measures (even though the limited sample size in such a comparison probably reduces the chances of observing an effect). Line 299 and line 528: Please explain what is meant with “with different speech materials”? Did the listeners perceive another (third) set of sentences? Please describe the details.

We thank the reviewer for requesting the precision of the sentences used. We added clarification in the section of method and additional analysis that the sentence distribution calculation is done on all the sentences, and the analysis window is from the sentence onset to the repeat prompt (line 291, line511). We also clarified in line229 and line528 that the ‘different’ materials referred to sentence not used in the previous experiment.

We agree that the HINT and the IEEE materials are different, not only in their length but also their semantic complexity. The analysis presented in Additional analysis deals to some extent with this variable since effects of duration are pooled together across the 2 materials, so there was more representation of French sentences in the long and very-long bins and more representation of English sentences in the short and very-short bins. However, we do not wish to draw the reader's attention to this language-induced difference because it is not a factor of interest in our scientific hypothesis, and it is not clear that the effect of language applies differentially to French and English participants. In other words, French participants may be used to relatively longer sentences than English participants (because of the way each language is constructed). So, the effect of duration might apply over different ranges in each language. This speculation is something for future empirical studies to resolve. We do not have the statistical power in this study to examine it properly, considering the number of main effect factors we are already investigating. What we ensured in the experimental design is that each participant performed the task in their native language, hence no extra cognitive effort required for processing non-native language. In this regard, the language of the materials was irrelevant. None of the present findings are about cognitive load due to lack of mastery in the materials. Therefore, effects of duration (and necessarily materials) have been postponed to the very end of the discussion (last section of Additional analyses) to make sure that the reader focuses on the take-home message of the paper which is about light effect and their interaction with SNR.

Line 311: Why do the authors refer to the current task as a “non-demanding” task? Please explain.

Speech in noise tasks conducted between 0 and 14 dB SNR are generally not considered difficult for NH listeners. The speech intelligibility performance is high (>90%) and approaching ceiling (line 292). The wording of “non-demanding” is consistent with previous literature in NH listeners.

Figure 2: to me, it is not clear why the pupil response in quiet, which seems relatively large in Figure 2 (plot showing the response over time) is actually relatively small in the plot showing mean PPD. What is causing the relative difference between the size of the effect between these plots? How do the authors explain the relatively large pupil response in the most easy condition?

This difference is due to the inherent variability in the individual pupil traces. When averaging traces who peak at slightly different times, the resulting average is smoothed down and could have a peak latency that is also slightly different from the average of the individual peak latencies. That is true of pupillometry just like many other neurophysiological techniques. For the PPDs shown in the errorbar plots and in statistical analysis, we calculate one PPD from aggregated 20 traces in a list, and then those more stereotypical traces are entered into statistics and calculating the Grand-average trace.This is similar to the difference when we calculate PPD based on single trial and based on list aggregation (PPD of a trace aggregated from 20 traces), in Supplementary materials.

In response to the latter question, the quiet condition did not differ significantly from +14 or even +7 dB; it’s really at 0 dB SNR that PPD amplitude and latency started to differ. So, we didn’t delve into this speculation.

Absent interaction between SNR and luminance: none of the many analyses found an interaction effect between SNR and luminance. However, the wording of some sentences is a bit confusing and seems to refer to such an interaction. For example: line 392: “interact”: please rephrase.

This is essentially because we suspect the interaction to exist but could not demonstrate it. For sake of rigor, we clarified (start of the second paragraph of the discussion of Exp2) that: “The lack of significant interaction between luminance and SNR conditions in our results show that this bias is relatively consistent across SNRs”. However, in the rest of this paragraph, we did delve into the speculation that an interaction might exist given a more global understanding of the pattern in our data as well as the existing literature.

Discussion: The way in which additional analyses and results are reported in the discussion section is not usual and not introduced properly. Consider moving these sections to the results section.

We have considered the re-organisation. However, the current format best suits the logic of the paper. The result section directly reflects our answers to the initial hypothesis. Then upon checking the results, further analyses were required in order to better understand possible reasons to explain our rather surprising results, i.e., we did not expect the effect of luminance to be of such a big impact on task-evoked response. To move the additional analysis to the results section might confuse the readers with the central message of our article and the flow of scientific hypothesis testing.

Reviewer #2: The authors have addressed many of this reviewer's original concerns but might consider the following issues:

1) Effect size measurements should appear in conjunction with relevant statistics to promote replication (mean diff, pooled std of the groups).

We thank the reviewer for pointing this out. We have added in the results section group mean and std where relevant (line289, line325, line 384, line391, line425)

2) It is unclear why the authors chose to implement proportional baseline scaling. This technique is not recommended as a baseline correction procedure as it can significantly distort the magnitude of evoked pupil dilation at high/low levels of the dynamic range. For example, an evoked change of .1mm from a 1mm baseline is a much higher %change relative to the same absolute change (.1mm) from a 9mm baseline. Several recent papers have addressed this issue along with recommendations for subtractive baseline scaling.

We did not choose proportional baseline scaling but a subtractive method. This was exactly the point of one section in Additional Analysis Effect of PPD calculation methods. In the main results reporting section (line263), we used subtractive method exactly due to the reasons proposed by the reviewer.

3) It is reasonably well established that at very high ends of the dynamic range of the pupil (e.g., intense light, absolute darkness), pupillary movements become idiosyncratic. In addition, eye trackers often become unreliable in these conditions because of challenges in contrasting pupil from iris. It was not clear what the overarching recommendations were here regarding control for luminance. In moderate ambient lighting conditions as most labs might encounter (e.g., fluorescent lighting) practical variability would not be too high (assuming testing in a windowless room).

We agree with the reviewer that in summary we recommend moderate ambient lighting for pupillometry experiments (line596). Our paper serves as a scientific investigation of how in low and high light, the effect of light is beyond just physiological constraints, but also with the task-evoked pupillary response. Previous studies acknowledged in the introduction had suggested that the relation between task-evoked and light-evoked pupillary response is complex and not completely consistent. Therefore, our study further contributes to this line of study to investigate this relation between two types of pupillary responses. We also made further analysis to explain the confounds of other possible factors in the experimental design that were not done in previous studies. Altogether, we believe that we presented a rigorous scientific case that luminance control is very important in pupillometry experiments and should be emphasized more in methods reports and experimental design.

4) I had difficulty finding links to a data repository to examine stimuli, etc.

To ease the access to data, we provided a zip file of all behavioural and pupil raw data with the submission. The zip file should appear at the end of the manuscript visible to the reviewer. However, we do not have the copyright to share the sentence stimuli (IEEE and HINT sentences) and they are only used in the experiment after the authorization of the owner and distributor of the materials.

Attachment

Submitted filename: rebuttal_letter.docx

Decision Letter 2

Sebastiaan Mathôt

18 Nov 2022

Luminance effects on pupil dilation in speech-in-noise recognition

PONE-D-22-07841R2

Dear Dr. Zhang,

Thank for submitting the revision of your manuscript. I did not send it out for review again, but rather checked myself whether all issues were addressed. And it is my pleasure to inform you that yes they have, and so your manuscript is herewith accepted for publication! I did still notice a handful of typos. My suggestion would be to wait until you receive the proofs, and then correct these. Thank you for contributing a valuable manuscript!

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Sebastiaan Mathôt, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Sebastiaan Mathôt

25 Nov 2022

PONE-D-22-07841R2

Luminance effects on pupil dilation in speech-in-noise recognition

Dear Dr. Zhang:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Sebastiaan Mathôt

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Results using trial-based approach for calculating PPD.

    (DOCX)

    S2 Appendix. Comparing pupil responses for correctly and incorrectly repeated sentences.

    (DOCX)

    S1 File

    (7Z)

    Attachment

    Submitted filename: rebuttal_letter .docx

    Attachment

    Submitted filename: rebuttal_letter.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES