Abstract
Use of speech signals and background noise is emerging in cortical auditory evoked potential (CAEP) studies; however, the interaction between signal type and noise level remains unclear. Two experiments determined the interaction between signal type and signal-to-noise ratio (SNR) on CAEPs. Three signals (syllable /ba/, 1000-Hz tone, and the /ba/ envelope with 1000-Hz fine structure) with varying SNRs were used in two experiments, demonstrating signal-by-SNR interactions due to both envelope and spectral characteristics. When using real-world stimuli such as speech to evoke CAEPs, temporal and spectral complexity leads to differences with traditional tonal stimuli, especially when presented in background noise.
1. Introduction
Real-world speech stimuli are being used more often in cortical auditory evoked potential (CAEPs) studies as a means to make laboratory testing more applicable to everyday listening conditions (Martin et al., 2008). In addition, background noise has been added to CAEP testing to represent difficult environments commonly experienced by listeners (e.g., Billings et al., 2011). Speech stimuli presented in background noise allow for more direct comparisons with behavioral speech-in-noise testing results.
Signal type has been found to affect CAEP morphology. Relative to tonal signals, speech signals generally result in longer CAEP latencies and larger amplitudes (Tiitinen et al., 1999; Čeponiené et al., 2001; Swink and Stuart, 2012). Presumably, slower onset speech stimuli result in longer latencies, while more complex harmonic structure leads to increased amplitudes in some cases. Interestingly, Swink and Stuart (2012) found mixed amplitude results when comparing a 723-Hz tone burst and the vowel /a/: the tonal stimulus resulted in smaller P1 amplitude but larger P2 amplitude. It may be that mixed amplitude effects result depending on the specific onset characteristics of the speech token that is used.
Background noise level also affects CAEP morphology. Latencies are generally longer and amplitudes are smaller compared with conditions where no background noise is present (e.g., Michaelewski et al., 2009; Billings et al., 2009); although, there are some limited exceptions where larger amplitudes are found in background noise (i.e., binaural presentation at fast rates; Papesh et al., 2015). When signal-to-noise ratio (SNR) is decreased, by systematically increasing noise level or decreasing the signal level, longer latencies and smaller amplitudes result (Michaelewski et al., 2009; Billings et al., 2015).
However, the interaction between signal type and SNR has not been clearly demonstrated. It is likely that differing onset characteristics, specifically stimulus rise time characteristics within the first 50 ms, interact with SNR to modify latency and amplitude values (Onishi and Davis, 1968).
With the long-term goal of furthering the potential use of CAEPs in diagnosing and rehabilitating speech-understanding-in-noise difficulties, two experiments were completed to explore how the envelope and spectral characteristic of signal onsets affect CAEPs. We hypothesized that there would be an interaction between signal type and SNR and that the interaction would be driven by the acoustic characteristics of the signals such as the onset envelope differences between the speech and tonal signals.
2. Experiment 1: SNR and signal type effects
2.1. Methods
Fifteen young normal-hearing individuals were tested (mean age = 27.6 years; 7 male and 8 females). Participants were all right handed and had normal hearing [less than or equal to 25 dB hearing level (HL)] with normal tympanometric measures (single admittance peak between ±50 daPa to a 226-Hz tone). All participants were in good general health and provided informed consent.
Signals consisted of a 1000-Hz tone and the syllable /ba/. Both signals had a duration of 450 ms and the tone signal had rise/fall times of 7.5 ms. The /ba/ speech signal was a naturally produced female exemplar shortened to a length of 450 ms by windowing the steady vowel offset portion of the token. Signals were presented at a sound pressure level (SPL) of 80 dBC (C-weighted). Speech spectrum continuous noise, created from the long-term spectrum of concatenated sentences spoken by a female talker, was added to the background and scaled to create five SNRs: 35, 25, 15, 5, and −5 dB. Stimuli were presented to the right ear using an Etymotic ER2 insert earphone.
Data presented in this study were collected as part of a larger project investigating the importance of signal level and SNR (Billings et al., 2015), necessitating that tone-evoked recordings and /ba/-evoked recordings be completed during separate sessions with order randomized. SNR condition presentation order was randomized within each session. Each signal condition was presented in a homogeneous train with simultaneous continuous background noise and a stimulus onset asynchrony of 2350 ms (i.e., onset to onset). Subjects were instructed to ignore the signals and watch a silent close-captioned movie of their choice. They were also asked to minimize head and body movement. At least 150 signal presentations were recorded for each stimulus condition. Evoked potential activity was recorded using a 64-channel Electro-Cap International, Inc. tin-electrode cap and NeuroscanTM SynampsRT amplifiers. The ground electrode was located on the forehead and the online reference electrode was located at vertex. The recording window consisted of a 200-ms pre-stimulus period and a 1100-ms post-stimulus time. Evoked responses were collected using an analog low-pass filter at 100 Hz, a gain of 500, and a sampling rate of 1000 Hz. Eye blink artifacts were corrected off-line using a spatial filter based on the covariance of blink activity across the scalp. Remaining artifacts exceeding ±70μV resulted in a rejected trial. The remaining sweeps were averaged and re-referenced to an average reference and then filtered from 0.1 Hz (high-pass filter, 12 dB/octave) to 30 Hz (low-pass filter, 12 dB/octave).
Waves P1, N1, and P2 were analyzed at electrode site Cz. Peak amplitudes were calculated relative to the pre-stimulus baseline, and peak latencies were calculated relative to stimulus onset (i.e., 0 ms). Latency and amplitude values for each wave were determined by agreement of two judges based on temporal electrode inversion, two sub-averages made up of even and odd trials, and global field power (GFP) traces. Rectified GFP area amplitude between 50 to 500 ms was also used as an overall measure of waveform robustness and as a general measure of the response recorded across all 64 channels. Mean amplitude, latency, and area measures were modeled statistically using a linear mixed model to determine effects of signal type and SNR on physiological measures.
2.2. Results and discussion
Grand averaged waveforms at electrode Cz for both signal types at various SNRs are presented in Fig. 1. Generally, as SNR decreases, latencies increase and amplitudes decrease for both the tone and /ba/. Of note, it appears that /ba/ latencies increase with decreasing SNRs at a faster rate than tone latencies.
Fig. 1.
(Color online) Grand averaged (n = 15) waveforms at various SNRs for speech (solid blue line) and tone (dashed red line) signals presented at 80 dB. In response to both /ba/ and tone signals, the P1-N1-P2 peaks increase in latency and decrease in amplitude with decreasing SNR. An interaction between signal type and SNR is evident as /ba/ latencies increase more rapidly than tone latencies as SNR decreases.
The analysis of latency results demonstrated significant main effects of SNR and signal type for P1, N1, and P2 peaks (P1 SNR: F(4,125) = 23.0, p < 0.0001; P1 signal type: F(1,125) = 14.6, p = 0.0002; N1 SNR: F(4,125) = 144.8, p < 0.001; N1 signal type: F(1,125) = 120.0, p < 0.001; P2 SNR: F(4,125) = 34.1, p < 0.0001; P2 signal type: F(1,125) = 43.2, p < 0.0001). In contrast, main effects on amplitude were mixed: main effects of SNR were found for N1 and P2 only (N1: F(4,70) = 25.1, p < 0.0001; P2: F(4,70) = 20.5, p < 0.0001), while main effects of signal type were not significant for any peak amplitude measure. For signal type by SNR interactions on latency, significant interactions were found for P1 and N1 (P1: F(4,125) = 5.8, p = 0.0003; N1: F(4,125) = 27.5, p < 0.0001) but not for P2. There were no significant interaction effects for any amplitude measure. Finally, area measures for GFP showed significant effects of SNR, type, and type-by-SNR interaction, indicating that these effects are not limited to the Cz electrode site. (SNR: F(4,70) = 23.1, p < 0.0001; type: F(1,69) = 30.4, p < 0.0001; interaction: F(4,69) = 8.0, p < 0.0001).
There may be several contributing factors to the SNR-by-signal-type interactions demonstrated in experiment 1. A simple explanation may have to do with the differing onset characteristics of the 1000-Hz tone and the syllable /ba/. The rise time of the tone is very short (i.e., 7.5 ms) relative to the slower rise time of the syllable /ba/. It is likely that a slower rise will be differentially masked compared to an abrupt rise as the SNR decreases. Figure 2 demonstrates how the differential masking may take place. In quiet, or a low-noise condition, signal onsets for the /ba/ and tone may occur at comparable times, resulting in similar neural response latencies (Fig. 2, arrows in left panel). However, as noise is added and the SNR decreases (Fig. 2, middle and right panels), more of the onset portion of the slowly rising speech signal is masked than the abruptly rising tone signal, potentially resulting in larger delays in neural response to /ba/, as indicated by the arrows. To test this hypothesis, experiment 2 was conducted to investigate whether signal rise characteristics were responsible for differences in CAEP responses found in experiment 1.
Fig. 2.
(Color online) Time waveforms for a subset of experiment 1 SNRs: 35 (left), 15 (middle), and 5 dB (right). The acoustic onsets of both the /ba/ and tone are increasingly masked as SNR decreases (i.e., as noise level increases). Arrows indicate how the hypothetical onset of audibility, and resulting neural responses, might differ across signals, such that onsets are increasingly delayed for the /ba/ relative to the tone (i.e., an interaction between signal type and SNR).
3. Experiment 2: Envelope effects
Stimuli from experiment 1 were used along with a third stimulus consisting of a 1000-Hz tone filtered using the envelope of the spoken /ba/ (see methods below). Testing with this stimulus was performed to investigate whether differences in responses between the tonal and speech stimuli were due to the wider and more complex frequency spectrum of speech or simply the longer rise time of the /ba/ (envelope characteristics). If responses to the filtered tone are similar to those of the /ba/, then one could conclude that envelope is the key factor; whereas, if responses to the filtered tone are more similar to the tone, then overall spectral characteristics of the /ba/ would be responsible for any difference.
3.1. Methods
Five young normal-hearing individuals participated in this experiment (mean age = 28.2 years; 3 male and 2 females). Three of these were also participants in experiment 1. All subjects met the same inclusion criteria and underwent the same screening and informed consent procedures as experiment 1.
Three different signals were used, consisting of an envelope signal in addition to the 1000-Hz tone and spoken /ba/ from experiment 1. The envelope signal was created by half-wave rectifying and then low-pass filtering the /ba/ signal at 50 Hz using a fourth order Butterworth filter, leaving primarily envelope information (Rosen, 1992). The temporal envelope was then applied to the 1000-Hz tone, resulting in the temporal envelope of the /ba/ signal with 1000-Hz fine structure, hereafter called the envelope signal. All three signals were presented at 80 dBC SPL. The background noise was identical to that used in experiment 1 and was scaled to result in SNRs of 15, 5, and −5 dB. A “quiet” condition (i.e., no background noise) was also included, resulting in four background noise conditions and 12 total conditions.
Electrophysiologic parameters were the same as in experiment 1, as were peak-picking procedures. All conditions were tested in a single session (∼2.5 h) to avoid the potential session-to-session variability that was present in experiment 1.
3.2. Results and discussion
Figure 3 demonstrates that as SNR decreases, CAEP latencies to both tone and /ba/ stimuli increase; however, latencies increase at a faster rate with decreasing SNR for /ba/ than for both tonal signals, which is consistent with experiment 1 results. Interestingly, peak latencies for the envelope signal fall between those of the /ba/ and tone signals at poorer SNRs, suggesting that a combination of envelope and spectral characteristics drive the signal-type-by-SNR interaction. A series of paired comparisons was completed to demonstrate the effects of signal type at 15, 5, and −5 dB SNR. Two signal type comparisons were of particular interest: (1) /ba/ vs envelope and (2) tone vs envelope. Thus, six comparisons were completed for each latency measure (i.e., 2 comparisons by 3 SNRs). Significant differences for the given comparisons were present for P1 latency at a 5-dB SNR for envelope minus tone (t-value [t] = 4.3, degrees of freedom [df] = 4, p = 0.012, effect size [ES] = 11.0, standard error of the mean [SEM] = 2.5), N1 latency at SNRs of 15 dB (envelope minus tone: t = 3.9, df = 4, p = 0.017, ES = 9.8, SEM = 2.5; /ba/ minus envelope: t = 3.1, df = 4, p = 0.035, ES = 12.4, SEM = 4.0) and 5 dB (envelope minus tone: t = 5.9, df = 4, p = 0.004, ES = 16.0, SEM = 2.7; /ba/ minus envelope: t = 5.1, df = 4, p = 0.007, ES = 26.0, SEM = 5.1), and P2 latency at a 15-dB SNR (/ba/ minus envelope: t = 2.9, df = 4, p = 0.044, ES = 21.6, SEM = 7.4). Using a Bonferroni-corrected alpha level of 0.008, only the N1 latency differences at 5 dB would be significant. Therefore, experiment 2, which was designed to determine the latency effects of a 1000-Hz tone with the temporal envelope of the syllable /ba/, revealed that latencies associated with the envelope signal generally fell between latencies using the tone and /ba/, and were significantly different than both tone and /ba/ in the case of N1 latency at a 5-dB SNR. This suggests that both temporal onset characteristics and spectral content of the envelope signal contribute to the CAEP latencies. We hypothesized that temporal rise characteristics of the onset would account for the interaction between signal type and SNR that was found in experiment 1. However, because the latencies associated with the envelope signal do not match those of the /ba/ signal, it is clear that spectral content of the signals is also an important factor.
Fig. 3.
(Color online) P1, N1, and P2 peak latencies across SNRs (error bars = standard error of the mean). In conditions with noise, most mean latencies to the envelope signal (dotted red) fell between /ba/ latencies (solid blue) and tone latencies (dashed green), with the greatest apparent differences between signal conditions at an SNR of +5 dB.
4. General discussion and conclusion
The purpose of these experiments was to determine the effects of stimulus type in the presence of varying levels of background noise. Several studies have addressed the effects of stimulus type in quiet and have shown that more abrupt onsets result in shorter latencies (e.g., Swink and Stuart, 2012; McCandless and Best, 1966). However, no study that we are aware of has illustrated the effects of signal type on cortical neural encoding while varying SNR. The potential interaction of signal type and background noise on speech encoding is important given the difficulties associated with speech understanding in background noise.
Experiment 1 illustrated an important interaction between signal type and SNR such that as SNR decreases, larger delays were found for the speech signal compared to the tonal signal. Based on the demonstrated role onset characteristics play in the resulting CAEP latencies and amplitudes (Onishi and Davis, 1968), we hypothesized that temporal envelope would be the primary contributor to the signal-type-by-SNR interaction. However, experiment 2 demonstrated that both temporal envelope and spectral content drive the response. Interestingly, latencies resulting from the envelope signal fall between the tone- and /ba/-related latencies especially at the poorer SNRs (see Fig. 3), presumably because the higher levels of background noise increasingly affect the neural coding of both temporal and spectral content.
Effects of signal type are not particularly new to the literature (e.g., Woods and Elmasian, 1986; Čeponiené, 2001; Swink and Stuart, 2012); however, the interactions between signal type and SNR demonstrated here are novel and have implications for emerging speech-in-noise electrophysiology studies. The temporal and spectral characteristics of both the signal and noise should be considered carefully so as to take into account potential interactions.
In conclusion, both signal type and SNR affected CAEP latencies; moreover, an interesting interaction between these variables demonstrated that latencies to speech increased more rapidly than tonal latencies as SNR became poorer. Furthermore, it is clear that temporal envelope (i.e., onset effects such as rise time) is partially responsible for the signal-type-by-SNR interaction, with spectral content characteristics presumably responsible for the remaining effect. It should be noted that depending on the noise type, contributions of temporal envelope and spectral content will be different. Additional testing will be needed to determine the magnitude of the relative contributions of both factors. This study provides additional detailed understanding of the neural coding of speech signals in background noise and helps to refine the potential use of CAEPs in both research and clinical populations.
Acknowledgments
We wish to thank Tina Penman and Angela Eilbes for assistance with data acquisition and processing, Garnett McMillan for statistical support, and Robert Burkard for helpful comments regarding the design of experiment 2. This work was supported by the United States (U.S.) Department of Veterans Affairs Rehabilitation Research and Development Service and the National Institutes of Health. The contents do not represent the views of the U.S. Department of Veterans Affairs or the U.S. Government.
References and links
- 1. Billings, C. J. , Bennett, K. O. , Molis, M. R. , and Leek, M. R. (2011). “ Cortical encoding of signals in noise: Effects of stimulus type and recording paradigm,” Ear Hear 32, 53–60. 10.1097/AUD.0b013e3181ec5c46 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Billings, C. J. , Penman, T. M. , McMillan, G. P. , and Ellis, E. (2015). “ Electrophysiology and perception of speech in noise in older listeners: Effects of hearing impairment and age,” Ear Hear 36(6), 710–722. 10.1097/AUD.0000000000000191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Billings, C. J. , Tremblay, K. L. , Stecker, G. C. , and Tolin, W. M. (2009). “ Human evoked cortical activity to signal-to-noise ratio and absolute signal level,” Hear. Res. 254(1), 15–24. 10.1016/j.heares.2009.04.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Čeponiené, R. , Shestakova, A. , Balan, P. , Alku, P. , Yiaguchi, K. , and Naatanen, R. (2001). “ Children's auditory event-related potentials index sound complexity and ‘speechness,' ” Int. J. Neurosci. 109(3-4), 245–260. 10.3109/00207450108986536 [DOI] [PubMed] [Google Scholar]
- 5. Martin, B. A. , Tremblay, K. L. , and Korczak, P. (2008). “ Speech evoked potentials: From the laboratory to the clinic,” Ear Hear 29(3), 285–313. 10.1097/AUD.0b013e3181662c0e [DOI] [PubMed] [Google Scholar]
- 6. McCandless, G. A. , and Best, L. (1966). “ Summed evoked responses using pure-tone stimuli,” J. Speech Hear. Res. 9(2), 266–272. 10.1044/jshr.0902.266 [DOI] [PubMed] [Google Scholar]
- 7. Michalewski, H. J. , Starr, A. , Zeng, F. G. , and Dimitrijevic, A. (2009). “ N100 cortical potentials accompanying disrupted auditory nerve activity in auditory neuropathy (AN): Effects of signal intensity and continuous noise,” Clin. Neurophysiol. 120(7), 1352–1363. 10.1016/j.clinph.2009.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Onishi, S. , and Davis, H. (1968). “ Effects of duration and rise time of tone bursts on evoked V potentials,” J. Acoust. Soc. Am. 44(2), 582–591. 10.1121/1.1911124 [DOI] [PubMed] [Google Scholar]
- 9. Papesh, M. A. , Billings, C. J. , and Baltzell, L. S. (2015). “ Background noise can enhance cortical auditory evoked potentials under certain conditions,” Clin. Neurophysiol. 126(7), 1319–1330. 10.1016/j.clinph.2014.10.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Rosen, S. (1992). “ Temporal information in speech: Acoustic, auditory and linguistic aspects,” Philos. Trans. R. Soc. London, Ser. B 336(1278), 367–373. 10.1098/rstb.1992.0070 [DOI] [PubMed] [Google Scholar]
- 11. Swink, S. , and Stuart, A. (2012). “ Auditory long latency responses to tonal and speech stimuli,” J. Speech Lang. Hear. Res. 55(2), 447–459. 10.1044/1092-4388(2011/10-0364) [DOI] [PubMed] [Google Scholar]
- 12. Tiitinen, H. , Sivonen, P. , Alku, P. , Virtanen, J. , and Näätänen, R. (1999). “ Electromagnetic recordings reveal latency differences in speech and tone processing in humans,” Cogn. Brain Res. 8(3), 355–363. 10.1016/S0926-6410(99)00028-2 [DOI] [PubMed] [Google Scholar]
- 13. Woods, D. L. , and Elmasian, R. (1986). “ The habituation of event-related potentials to speech sounds and tones,” Electroencephalogr. Clin. Neurophysiol. 65(6), 447–459. 10.1016/0168-5597(86)90024-9 [DOI] [PubMed] [Google Scholar]



