Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2015 Jul 6.
Published in final edited form as: J Acoust Soc Am. 2014 Jun;135(6):EL357–EL363. doi: 10.1121/1.4879667

Temporal predictability enhances auditory detection

Emma L A Lawrance 1, Nicol S Harper 1, James E Cooke 1, Jan W H Schnupp 1
PMCID: PMC4491983  EMSID: EMS63481  PMID: 24907846

Abstract

Periodic stimuli are common in natural environments and are ecologically relevant, for example, footsteps and vocalizations. This study reports a detectability enhancement for temporally cued, periodic sequences. Target noise bursts (embedded in background noise) arriving at the time points which followed on from an introductory, periodic “cue” sequence were more easily detected (by ~1.5 dB SNR) than identical noise bursts which randomly deviated from the cued temporal pattern. Temporal predictability and corresponding neuronal “entrainment” have been widely theorized to underlie important processes in auditory scene analysis and to confer perceptual advantage. This is the first study in the auditory domain to clearly demonstrate a perceptual enhancement of temporally predictable, near-threshold stimuli.

Introduction

Many natural auditory stimuli contain temporal regularities,1 for instance, footsteps of potential prey or predators, or the structure of the speech envelope. These regularities make the on-going stimuli partly predictable, which is a feature that the brain may be able to exploit.2 Predictive coding is a theoretical framework whereby the brain exploits regularities in sensory input to optimize processing.3 Evidence for this to date comes largely from the visual domain.4 In comparison, the use of predictive coding to process temporal regularities in audition is a relatively understudied phenomenon. Here we investigate the possibility that the detectability of near threshold acoustic events (noise bursts) might be enhanced if these events form a continuation of a supra-threshold periodic series.

One neural process thought to underlie anticipatory processes is “entrainment,” or “phase-locking” of oscillations in activity (membrane potential or firing rates) of a neuronal ensemble to the structure of a periodic, statistically predictable attended stimulus.5 Specifically, low-frequency oscillations in cortical neural activity (including auditory cortex6,7), which reflect cyclical changes in population excitability,8 can align with the timing of event onsets.4 Low-frequency neuronal oscillations can become entrained to the envelope of natural sounds9 and speech.10 Hence entrainment concentrates high-excitability phases of activity at times of expected important auditory events, increasing their likelihood of eliciting a neural response.7,11

This physiological entrainment of fluctuations in neural excitability should have perceptual and behavioral consequences if it is of functional importance. Indeed, temporal regularity in visual stimuli has been shown to improve target discrimination.2 In the auditory domain, stimulus regularity and/or entrainment have been theorized to influence sound source segregation (“the cocktail party problem”)1 and attention (dynamic attending theory12-14). Various experiments show enhanced detection of pitch,15 spectral alterations in a sequence of tones,16 or altered interval durations17,18 for targets arriving in line with expectations conveyed through a periodic sequence. Temporal expectations set up through auditory regularities also decrease reaction times to targets19 and alter attention to deviant stimuli.20 Indeed, behavioral measures of the influence of temporal expectation have been linked to aggregate electrophysiological measurements (electroencephalography, EEG) of entrainment: Improved reaction speed is observed when target stimuli coincide with the peak of the entrained EEG pattern.21 However, the effect of auditory regularity on near-threshold target detection is little examined. Ng et al.22 reported a neuronal phase modulation of detection that was stronger for “miss” than for “hit” trials, suggesting an inhibiting, but not “ensuring” role of entrainment on detection. It has been argued that their “entraining” stimulus did not include sufficient temporal structure to promote entrainment.11 Additionally, a gap detection study suggested a role for neuronal oscillatory phase in modulating gap detection.23 In apparent contrast to work from the visual domain, Zoefel and Heil24 did not find the detectability of near-threshold tones presented at 0.5 Hz to depend on the phase of delta-oscillations, although the amplitude of the 0.5 Hz entrained oscillation was higher for hits than for misses. As yet, no study has explored the influence of continuing periodicity on auditory detection. We investigate this, against a backdrop of strong ethological relevance and theorized likelihood that regularity modifies auditory processing.

This psychophysics study examines whether there are decreased detection thresholds for temporally predictable stimuli following an explicit temporally periodic “entraining” introductory sequence. The use of target stimuli near threshold increases the perceptual demand, allowing us to explore perceptual modulations of “entrainment,” reminiscent of Cravo et al.4 in the visual domain. We compared the detection thresholds for five “on-beat” periodic target noise-bursts, to those of five aperiodic but otherwise identical targets. If the temporal predictability (induced by the introductory sequence) of periodic targets enhances their processing, then periodic targets should be detectable at lower signal-to-noise ratios (SNRs). This was indeed observed, with significantly enhanced detection of the periodic stimuli relative to the aperiodic stimuli.

Methods

The methodology was approved by the Ethical Review Committee of the Experimental Psychology Department of the University of Oxford, and it conforms to the ethical standards of the 1964 Convention of Helsinki. We used an adaptive method to determine detection thresholds for periodic and aperiodic broadband noise-bursts in a background of continuous broadband noise. Experiments were conducted in a brightly illuminated sound-attenuating chamber. MATLAB was used for stimulus generation, response collection, and analysis. Stimuli were played through a TDT RM1 mobile processor (Tucker Davis Technologies, Alachue, FL, USA), and presented diotically over Sennheiser (Wedemark, Germany) HD 650 headphones.

All stimuli were digitally generated broadband Gaussian noise with a sampling rate of 48 828 Hz. Background noise was presented at a comfortable listening level (55 dB sound pressure level) for the duration of the experiment. All stimuli were constructed as sequences of brief (25 ms) noise bursts atop the on-going noise background. These were implemented by transiently raising the noise intensity by a given value L in dB (hence an L of 0 dB signifies the lack of a noise burst). Signal intensities are reported in dB signal-to-noise ratio (SNR). The relationship between the noise burst level above background, L (dB), and the signal-to-noise ratio, R (in dB) is given by the equation

R=10log10(10(L10)1). (1)

All stimuli started with an introductory section of seven noise-bursts presented at a regular 4 Hz rate, but with rapidly decreasing intensities (8.6389, 6.9732, 5.4554, 4.1244, 3.0103, 2.1244, 1.4554 dB SNR) to generate the percept of a periodic signal that rapidly fades into the background noise. The introductory section of seven noise bursts was followed by a target section which either contained a further five target bursts (25 ms, all with identical SNR) or it did not. Stimuli were presented in pairs, one with target noise bursts and one without, in random order (with equal likelihood), and participants were asked to indicate which of the two stimulus intervals contained target bursts (a “Two Interval Forced Choice” or 2IFC procedure).25 To determine detection thresholds, the sound levels of the target bursts were adjusted from trial to trial using an adaptive procedure described further below. During any one 2IFC trial, the five target bursts could be either periodic or aperiodic (Fig. 1). For periodic stimuli, the five target bursts followed the 4 Hz rate established by the seven introductory bursts, so their timing was predictable. In aperiodic stimuli, in contrast, the intervals between target bursts were independently and randomly chosen from a uniform random distribution over 150–350 ms, so the 4 Hz time frame established by the introductory noise bursts provided almost no cues as to when the target bursts might occur. Trials containing either periodic or aperiodic stimuli were randomly interleaved (with equal likelihood), so that detection thresholds could be determined for each stimulus type. No performance feedback was provided.

FIG. 1.

FIG. 1

Experimental protocol. Each trial consisted of two intervals (4.5 s each), with continuous broadband background noise throughout. One interval contained only seven introductory start bursts, while the other interval contained the seven introductory bursts followed by five target bursts (level adapted to participant performance at detecting target bursts). For periodic trials, the target bursts continued the introductory periodic sequence (on-beat), while for aperiodic trials the target bursts were “jittered” from the “on-beat” position. Timings shown are taken from an exemplar experimental protocol.

The target burst intensity started at 3.35 dB SNR for both conditions (very easily audible, with consequent perfect performance) and was reduced in 2 dB SNR steps for each correct response until the first participant error. Thereafter, target burst SNR was adjusted depending on performance in the previous two trials according to the following schedule: correct-correct: −0.5 dB SNR, incorrect-correct: no change, correct-incorrect: +0.5 dB SNR, incorrect-incorrect: +0.5 dB SNR. The task finished after 15 reversal points (local maxima/minima) on both adaptive tracks, with both conditions running until completion point for the lagging track.

Traditionally, adaptive methods are analyzed using thresholds calculated from the average of reversal points (burst SNR at local maxima/minima).25 Such methods use only a small proportion of available data. To exploit the full data set, we used a binomial probit regression analysis to fit psychometric curves to the responses to every trial. These curve fits were further constrained to chance (50%) performance at target burst levels of L = 0 dB, and to perfect (100%) performance at L ≥ 3 dB). While it has been shown that fitting probit models to data gathered with an adaptive method introduces a positive bias in the estimated slope parameters, constraining the model to p = 0.5 for L = 0 dB as we have done, has been demonstrated to reduce this bias to very small values (a few percent of the slope for data sets containing 100 points or more).26 The probit analysis models the relationship between the probability of a correct response and the stimulus parameters on each trial according to

P(correct response)=Φ(b1L+b2TL), (2)

where Φ is the cumulative standard normal distribution (with μ = 0 and σ = 1), L is the target noise-burst level above background in dB, T is the trial type (0 for periodic trials, 1 for aperiodic trials), and b1 and b2 are the adjustable weight parameters of the model, which were estimated from the data using maximum likelihood techniques (MATLAB function glmfit). When the trial type is periodic, T = 0, and hence the probability of a correct response is explained by level alone: P(correct response)=Φ(b1L). When T = 1, the effect of level on the probability of a correct response is altered by b2: P(correct response)=Φ(b1+b2)L. Hence if the aperiodic condition confers an advantage, b2 will be positive. Conversely, if our hypothesis is correct and periodic targets are more easily detected, then b2 will be negative, as irregularity decreases the probability of a correct response at a particular target level L.

Results

The 26 participants (aged 21–37, mean 24.9 yr, 16 female) were all of normal hearing. As explained above, the responses of the subjects were fitted with psychometric curves of the form described in Eq. (2), in which the b2 parameter quantifies the “advantage” of aperiodic relative to regular, periodic target noise bursts. For 21/26 participants, b2 was negative, indicating improved detection of periodic over aperiodic targets. For the population, b2 is robustly significantly different from zero (p = 0.0025, sign test), in support of our hypothesis. Unsurprisingly, there is also a significant effect of target level on detection: b1 is on average highly significantly positive across the population: (p=2.98×10–8, sign test), indicating that correct response probability increases with level. From the probit model [Eq. (2)] the slope of performance increase with level is given by b1 in the periodic condition (mean: 0.85) and b1 + b2 in the aperiodic condition (mean: 0.65). Equation (3) quantifies the percentage increase (S) in the slope of the psychometric function when going from an aperiodic to periodic condition,

S=100(b1b1+b21) (3)

The mean increase in slope for the periodic over the aperiodic condition is 46% across all participants, or 63% across the 21 participants with enhanced performance in periodic trials. The detectability enhancement for periodic trials is also evident in Fig. 2(A), which shows the number of correct and incorrect trials completed across all participants for both trial types at the target levels roved during the adaptive paradigm. There is a shift in the peak of the distribution of correct and incorrect trials to lower signal-to-noise levels for the periodic trial type. As target level on each trial was dependent on participant performance (adaptive paradigm), this shift indicates higher performance in periodic trials relative to aperiodic trials. A comparable shift is evident in Fig. 2(B), which plots the ratio of correct trials to total trials, calculated from Fig. 2(A).

FIG. 2.

FIG. 2

(Color online) (A) Number of correct and incorrect trials for aperiodic (red) and periodic (black) trials, collated for all participants, at the target levels (dB SNR) roved during the adaptive paradigm. The target level for each trial was dependent on participant performance. A modest shift to lower signal-to-noise ratios is evident for periodic trials completed, indicative of enhanced performance relative to aperiodic trials. (B) Proportion of trials with correct target detection, for each target level bracket (three point average), with raw data also shown. The color saturation indicates the number of trials completed in each target level bracket for target levels with more than 45 trials completed across all participants. (C) Aperiodic threshold versus periodic threshold, compared to the line of unity. In 21 of 26 participants, periodic thresholds are lower than aperiodic thresholds, which indicates an enhanced detection of periodic target sounds.

Detection thresholds for both trial types [level for P(correct response) = 0.707]25 were determined using the probit fits to the psychometric functions fitted from the regression model and are shown in Fig. 2(C). A lower threshold for periodic over aperiodic targets indicates increased detectability for the periodic targets, as observed for 21/26 participants [Fig. 2(C)]. The difference in thresholds (aperiodic-periodic) provides a measure of the enhancement conferred to the periodic stimuli. The mean difference across participants is 1.5 dB SNR (median 1.8 dB SNR), equivalent to detection of the periodic stimulus at ~75% of the power of the aperiodic stimulus. Hence, the temporal cueing provided by the established periodic sequence can lower detection thresholds for stimuli by approximately 1.5 dB SNR on average, although the effect size varies considerably across participants, with a lowering of thresholds by as much as 3–5 dB SNR for several participants, and with five participants showing the reverse, or no improvement

The aperiodic threshold may be thought of as a “baseline” measure of a participant’s ability to detect targets in a purely “feed forward” manner, without any advantage conferred by temporally predictable stimuli. The effect size (difference in thresholds) correlates strongly and significantly (p < 0.0001) with aperiodic threshold, suggesting that periodicity confers the largest absolute advantage to participants who struggle most to detect target sounds generally.

Discussion

Sensory processing advantages have been suggested to accrue from predictive coding for expected future events. Our study supports the hypothesis that sounds are more easily detected when they are the continuation of a temporally predictable sequence.

The variation in effect size across participants may be due to several factors, including differing entrainment susceptibility or alertness levels during the study. While there are no auditory paradigms sufficiently similar to enable an effect size comparison, an analogous recent visual experiment reports an increase of up to 55% (stimulus “intensity”) in participants’ sensitivity for unmasking visual stimuli arriving in phase with the entrained neural oscillation,27 of similar order to our finding of a 25% reduction in signal power required for detection of periodic relative to aperiodic targets.

The observed perceptual advantage for periodic targets is possibly due to heightened processing of sounds arriving at high-excitability phases of a neuronal oscillation entrained to the introductory burst sequence. The EEG can entrain to low frequency modulations of continuous noise23 or sequences of tones.20 Repetitive sounds, as used in this study, are common in behaviorally relevant scenarios, e.g., footsteps, and it has been proposed that an entrainment of neuronal oscillations can play a role in important perceptual functions such as selective attention,8 speech processing,28 and musical processing.29 The modulation rate of our stimuli (4 Hz) matches a number of modulation patterns in speech, for example, the temporal envelope of speech contains periodic structure imparted by prosody and phrasing (<4 Hz) and syllable structure (4–8 Hz, which overlaps with the theta frequencies in EEG).30

Four recent studies investigated the influence of neuronal oscillatory phase on detection of auditory stimuli, but they produced mixed results.22-24,31 There are important distinctions between the (sometimes controversial) protocols used across these studies and our study, which may explain the heterogeneity of results.11 Neuling et al.31 report that phase of alpha-oscillations entrained using oscillating transcranial direct current stimulation (o-TDCS, 10 Hz) modulates auditory target detection in background noise, with 0.6 dB normalized SNR difference between “best” and “worst” oscillatory phase. Yet there may be unknown differences between o-TDCS induced entrainment and that driven using background sound, and results could be confounded by o-TDCS periodically activating middle ear muscles.24 Additionally, Henry and Obleser23 report that gap detection varies with the phase of neural delta-oscillations entrained by a 3 Hz frequency modulated stimulus.

In contrast, Ng et al.22 report that the phase of “entrained” theta oscillations modulates “miss rates” for auditory target detection in natural backgrounds more strongly than “hit rates,” suggesting entrained phase can “preclude” but not “ensure” (“boost”) detection of regular auditory stimuli. Their protocol has been criticized for lacking explicit regular entrainment-promoting structure in the natural sound background.11 Further Zoefel and Heil24 raise the possibility that the positive results of Ng et al.22 and Henry and Obleser23 may result from methodological errors in measuring entrained phase. No other study has reliably demonstrated a behavioral benefit to detectability conferred by predictable temporal regularity of an auditory stimulus.

Conclusions

This study demonstrates a detectability enhancement for temporally predictable near-threshold auditory stimuli. Periodic broadband noise-bursts (embedded in background broadband noise) that followed the introductory repetition rate were significantly easier to correctly detect at lower signal-to-noise levels than aperiodic bursts. The effect was modest, with a 1.5 dB SNR detection advantage conferred to the periodic stimuli relative to the aperiodic. This phenomenon is ecologically adaptive, boosting detection of relevant periodic auditory stimuli such as footsteps in background noise. Additionally, our result offers a psychophysical correlate and perceptual function for the phenomenon of neural entrainment.

Acknowledgments

This work was supported by a Clarendon Fund scholarship to E.L.A.L., and a Sir Henry Wellcome Postdoctoral Fellowship [WT076508AIA], and funds from the Department of Physiology, Anatomy and Genetics, University of Oxford to N.S.H. Thanks to Geoff Nicholls and Ben Willmore for assistance with analysis and experimental setup.

References and links

RESOURCES