Abstract
In this study, a nonlinear version of the stimulus-frequency OAE (SFOAE), called the nSFOAE, was used to measure cochlear responses from human subjects while they simultaneously performed behavioral tasks requiring, or not requiring, selective auditory attention. Appended to each stimulus presentation, and included in the calculation of each nSFOAE response, was a 30-ms silent period that was used to estimate the level of the inherent physiological noise in the ear canals of our subjects during each behavioral condition. Physiological-noise magnitudes were higher (noisier) for all subjects in the inattention task, and lower (quieter) in the selective auditory-attention tasks. These noise measures initially were made at the frequency of our nSFOAE probe tone (4.0 kHz), but the same attention effects also were observed across a wide range of frequencies. We attribute the observed differences in physiological-noise magnitudes between the inattention and attention conditions to different levels of efferent activation associated with the differing attentional demands of the behavioral tasks. One hypothesis is that when the attentional demand is relatively great, efferent activation is relatively high, and a decrease in the gain of the cochlear amplifier leads to lower-amplitude cochlear activity, and thus a smaller measure of noise from the ear.
Keywords: selective attention, otoacoustic emission, cochlear noise
1. INTRODUCTION
Although much has been learned about the anatomy, neurophysiology, and biochemistry of the olivocochlear efferent system since the early reports of Rasmussen (1946, 1953), its function during everyday listening remains uncertain. Motivated by the seminal, if controversial, report by Hernández-Peón et al. (1956), there has been continual interest in the question of whether the olivocochlear efferent system plays a role in selective attention (e.g., Picton and Hillyard, 1971; Puel et al., 1988; Meric and Collet, 1994a; Fritz et al., 2007). Hernández-Peón et al. (1956) reported that gross electrical potentials recorded from the dorsal cochlear nucleus (DCN) in the auditory brainstem of cats were reduced in magnitude when the animals attended to visual, somatic, or olfactory stimuli, relative to when the animals were in a state of inattention. Ultimately, this fascinating finding was discredited on grounds of poor experimental control (e.g., Worden, 1973). Nevertheless, it created considerable, continuing interest in the possibility that the attentional demands of a behavioral task, or those of an environment, can modulate the afferent responses of the peripheral auditory system, either at the level of the auditory brainstem (as in Hernández-Peón; e.g., Lukas, 1980), or in the responses of the cochlea (e.g., Puel et al., 1988; Giard et al., 1994; Maison et al., 2001, Harkrider and Bowers, 2009). Research confirms everyday experience that humans are able to control their attention (Hafter et al., 1998; Gallun et al., 2007). However, after more than one-half century of research, there is a paucity of clear evidence that cognitive processes—such as the selective allocation of attentional resources—can affect the responses of the afferent auditory periphery.
If attentional demands (or other cognitive or perceptual demands) were capable of modulating afferent auditory responses at the level of the cochlea—the transduction stage in the auditory system—the medial olivocochlear bundle (MOCB) would be one neural pathway through which these effects would be realized. This pathway originates in the superior olivary complex of the brainstem, which is innervated directly by efferent neurons originating in the inferior colliculus and auditory cortex (Mulders and Robertson, 2000a, 2000b). The fibers of the MOCB terminate at the bases of the outer hair cells (OHCs) of the cochlea (Warr and Guinan, 1979), where they inhibit (hyperpolarize) the OHCs, which in turn increases the local stiffness of the cochlear partition and diminishes the displacement of the basilar membrane (Cooper and Guinan, 2006), thereby reducing the afferent output of the inner ear. If the efferent flow from the cortex to the brainstem varied with level of attention, then the activity in the MOCB would vary, as would the afferent flow from the cochlea.
The OHCs are part of the cochlear-amplifier system (Davis, 1992) that is thought to be involved in the production of otoacoustic emissions (OAEs)—weak sounds produced in the inner ear and measured in the external ear canal (Kemp, 1978, 1980). For this reason, OAEs have been used to examine how human cochlear responses are affected by MOCB activation, and in turn how the attentional demands of a behavioral task affect efferent feedback to the cochlea. Previous studies have demonstrated that OAE magnitudes measured during auditory- or visual-attention tasks were different from OAE magnitudes measured during tasks that did not require attention (Puel et al., 1988; Froehlich et al., 1990, 1993; Meric and Collet, 1992, 1994b; Giard et al., 1994; Ferber-Viart et al., 1995; Maison et al., 2001; Harkrider and Bowers, 2009). However, across studies, the directions of the attentional effects on OAEs have been inconsistent, the magnitudes of the observed differences always have been small (less than about 1 dB), and comparisons across studies have been made difficult by significant procedural differences (see Discussion).
This is the first in a series of reports describing differences in cochlear responses when human subjects are, or are not, engaged in selective attention to either auditory or visual stimuli. In all cases, physiological and behavioral auditory measures were obtained simultaneously during the same test trials. In this first report, a nonlinear version of the stimulus-frequency OAE (SFOAE), called the nSFOAE or the residual from linear prediction (Walsh et al., 2010a, 2010b), was used to measure cochlear responses during tasks that required either selective auditory attention to strings of digits spoken by one of two simultaneous talkers (dichotic or diotic listening), or relative inattention. In a companion paper, we report similar results involving visual rather than auditory attention (Walsh et al., 2014). These first two reports emphasize cochlear measures made during brief silent periods following the nSFOAE-evoking stimuli. Later we also will report parallel measurements obtained during the nSFOAE-evoking stimuli, which we call “perstimulatory” measures. Both the silent-period and perstimulatory measures exhibited marked differences during attention and inattention conditions.
Our measure of physiological noise was recorded in the external ear canals of our subjects during every behavioral condition, using the same cancellation procedure used to estimate the perstimulatory nSFOAE response. In contrast to the majority of previous studies on the effects of attention on OAEs that also measured noise levels in the test ears (Froehlich et al. 1990, 1993; Ferber-Viart et al., 1995; de Boer and Thornton, 2007; Harkrider and Bowers, 2009), every subject exhibited consistent differences in our physiological-noise measure between the inattention and selective-attention conditions. Specifically, the magnitudes of the physiological noise always were higher during the inattention condition than during the auditory selective-attention conditions, the differences being about 3.0 dB averaged across subjects, attention condition, and test frequency.
2. METHODS
2.1. General
This first report focuses on an auditory measure of the physiological noise present in the external ear canals of humans during each of several auditory-attention conditions. A nonlinear procedure was used to estimate the level of the nSFOAE during a brief silent period following each nSFOAE-evoking stimulus presentation. The Institutional Review Board at The University of Texas at Austin approved the procedures described here. All subjects provided their informed consent prior to any testing, and they were paid for their participation. The behavioral measures will be described first, followed by the physiological measures. Then, a description will be provided of the integration of the behavioral and physiological measures.
Subjects
Two males (both aged 22) and six females (aged 20 – 25) were paid an hourly rate to participate in this study. All eight subjects completed two 2-hr auditory-attention sessions. Across those sessions, each subject completed each of the experimental conditions to be described a minimum of four times. All subjects had normal hearing [≤ 15 dB Hearing Level (HL)] at octave frequencies between 250 and 8000 Hz, and normal middle-ear and tympanic reflexes, as determined using an audiometric screening device (Auto Tymp 38, GSI/VIASYS, Inc., Madison, WI). Across the eight subjects, two ears, and four frequencies (0.5, 1.0, 2.0, and 4.0 kHz), the average middle-ear reflex (MER) threshold in our subjects was about 91 dB HL, and no individual subject had unusually low or high thresholds. No subject had a spontaneous otoacoustic emission (SOAE) stronger than −15.0 dB SPL within 600 Hz of the frequency of the 4.0-kHz probe tone used to elicit the nSFOAE.
2.2. Behavioral measures
Each subject was tested individually while seated in a reclining chair inside a double-walled, sound-attenuated room. Two insert earphone systems delivered sounds directly to the two external ear canals. (The earphone systems are described in detail in section 2.3 below.) Some of the sounds presented were relevant for the behavioral task, and interleaved with these sounds were the stimuli for evoking the nSFOAE response. A computer screen attached to an articulating mounting arm was positioned by the subject to a comfortable viewing distance, and was used to provide task instructions and trial-by-trial feedback. A numerical keypad was provided to the subject to indicate his or her responses on the behavioral task.
2.2.1. Selective auditory-attention conditions
Dichotic condition
There were two auditory selective-attention conditions: one involved dichotic presentation of the stimuli for the behavioral task, and one involved diotic presentation. For the dichotic-listening condition, two competing speech streams were presented separately to the ears, and the task of the subject was to attend to one of the speech streams. In one ear the talker was female, in the other ear the talker was male, and which ear received the female talker was selected trial-by-trial from a closed set of random permutations. The number of trials having the female voice presented to the right ear was approximately equal to the number of trials having the female voice presented to the left ear. On each trial, the two talkers simultaneously spoke two different sequences of seven single-digit numbers. Each digit (0 – 9) was selected randomly with replacement, and the digit sequence spoken by the single female talker was selected independently from that spoken by the single male talker. Each digit was presented during a 500-ms interval, and consecutive digits were separated by 330-ms interstimulus intervals (ISIs). As described below, the stimulus waveforms used to elicit the nSFOAE response were presented in the ISIs between spoken digits. Fig. 1 shows an example of the speech waveforms presented on a single trial of the dichotic-listening condition.
The subject always was instructed to attend to the ear in which the female was talking, to remember the seven-digit sequence that she spoke, and then to choose a subset of that sequence from one of two choices presented visually on a computer screen at the end of the trial. Each choice of response consisted of five digits, presented simultaneously, and the correct choice always corresponded to the middle five digits spoken by the female talker. The incorrect choice differed from the correct choice by only one randomly mismatched digit, and it also was unrelated to the digit sequence spoken by the male talker.
Not shown in Fig. 1 is the silent, 2000-ms response interval that occurred at the end of each trial, during which the subject responded by pressing one of two keys on a keypad (“4” or “6”) to indicate whether the correct series of digits was displayed on the left or the right side of the computer screen, respectively. The “5” key had a raised nipple so that the subject knew where his or her fingers were located without having to look at the keypad. At the beginning of a block of trials, the subject placed his or her index, middle, and ring fingers on the “4,” “5,” and “6” keys, respectively, and maintained that placement through the entire block. This eliminated body and head movements, which would disrupt the nSFOAE recordings. Immediately following the behavioral response, the series of digits selected by the subject was surrounded by an illuminated border, and a 200-ms feedback light indicated which of the two response choices was correct.
2.2.1.1. Diotic condition
The diotic-listening condition was similar to the dichotic-listening condition with the exception that the male and female talkers were presented simultaneously to both ears on each trial, rather than to separate ears. Thus, the dichotic-listening condition required attention to one of two spatial locations (the left or the right ear), whereas the diotic-listening condition required subjects to partition two speech streams that seemed to originate from the same location in space, roughly in the center of the head. It is intuitive that the diotic condition would be more difficult than the dichotic condition, but this did not prove to be true for all subjects.
2.2.1.2. Speech Stimuli
One female talker and one male talker were recorded to create the speech stimuli. The ten individual digit waveforms for each talker were fitted to a 500-ms window by aligning the onset of each waveform with the onset of the window, and by adding the appropriate number of zero-amplitude samples (“zeros”) to the end of each waveform to fill the window. The recordings were made using a 50-kHz sampling rate and 16-bit resolution, and were not filtered or processed further before being saved individually to disk. Before presentation, all waveforms were lowpass filtered at 3.0 kHz, and were equalized such that the overall level of each waveform was about 50 dB SPL. These levels were weaker than the levels of the sounds used for evoking the nSFOAE (see below).
2.2.2. Inattention condition
An inattention condition was used for comparison with the dichotic- and diotic-listening conditions just described. This control condition was designed to differ from the selective-attention conditions primarily in the amount of cognitive resources required to perform the behavioral task. During each trial of the inattention condition, series of speech-shaped noise (SSN) stimuli (described below) were presented dichotically to the two ears instead of spoken digits. The SSN stimuli had the characteristics of speech without actually sounding like speech. As in the selective-listening conditions, the SSN stimuli were interleaved with the nSFOAE stimuli, and the timing of the stimulus presentations was the same as for the two selective-listening conditions (see Fig. 1). The subject’s task simply was to press the number “4” on the response keypad at the end of each trial, after the final sound in the stimulus series.
The SSN stimuli were constructed by taking the Fast Fourier Transform (FFT) of each of the 20 speech waveforms used in the auditory-attention conditions and creating 20 samples of noise having the same overall frequency and amplitude spectra, and the same durations. A Hilbert transform was used to extract the envelope from each spoken digit from both talkers. Those envelopes were lowpass filtered at 500 Hz to limit moment-to-moment fluctuations, and then applied to the relevant sample of noise. The resultant waveforms were not intelligible as speech although the stimuli derived from the female talker were noticeably higher in pitch. Similar to the dichotic-listening condition, one ear received a series of seven SSN stimuli derived from the female talker, and the other ear received a series of seven SSN stimuli derived from the male talker. Different series of SSN stimuli were presented on different trials, having been selected randomly with replacement. The ear that received the “female” noise bursts was selected randomly on each trial from a random set of permutations that equated the number of trials in a block during which the “female” noise bursts were presented to the right versus left ear. Although these manipulations were important as controls, they were not important for the subject, who was not required to attend preferentially to any of these stimuli.
2.2.2.1. General
There were nSFOAE-evoking stimuli interleaved with the speech sounds such that every test trial had the potential to yield two physiological responses (see below). To be accepted for averaging, those physiological responses had to meet certain pre-established criteria (see Appendix), but they were evaluated for acceptance only if a key-press response (correct or incorrect) was made during the 2000-ms response interval. When a response failed to meet the criteria for acceptance, an additional trial was added to the block, and subjects received trial-by-trial feedback about this process. By design, every block of trials provided at least 30 trials having both a behavioral response and at least one accepted physiological response. Because sometimes subjects did not produce a key-press within the allotted time, and sometimes the physiological responses did not meet the pre-established criteria, the number of trials necessary to acquire 30 usable trials varied across blocks. The physiological responses obtained on trials having a correct behavioral response were stored separately from the responses obtained on trials having an incorrect response, but in the end, the latter were discarded because they were based on too few trials to be reliable. Thus, the physiological responses reported here are only those obtained from behaviorally correct trials, meaning that, depending upon a subject’s behavioral performance, the physiological responses for a particular block were based on about 20 to 30 trials. After pooling across blocks (described below), the final physiological responses were based on about 80 – 120 trials. Past experience with the nSFOAE procedure (Walsh et al., 2010a, 2010b) revealed that this is sufficient averaging to obtain reliable responses. The typical duration of a block of trials was about 4 – 6 min.
2.3. Physiological measures
The stimuli used to evoke the nSFOAE responses were presented on the same trials used to collect the behavioral responses. The nSFOAE-evoking stimuli were interleaved with the speech or SSN stimuli and were delivered directly to the ears by the same two insert earphone systems used to present the speech or SSN stimuli. For the right ear, two Etymotic ER-2 earphones (Etymotic, Elk Grove Village, IL) were attached to plastic sound-delivery tubes that were connected to an Etymotic ER-10A microphone capsule. The microphone capsule had two sound-delivery ports that were enclosed by the foam ear-tip that was fitted into the ear canal. The nSFOAE responses were elicited by sounds presented by both ER-2 earphones (see below), and were recorded using the ER-10A microphone. No microphone was present in the left ear, just a single ER-2 earphone for presenting the nSFOAE-evoking and speech (or SSN) stimuli. The nSFOAEs and physiological-noise measures always were recorded from the right ear only, but the nSFOAE-evoking stimuli were presented to both ears simultaneously. Accordingly, here the right ear sometimes is referred to as the ipsilateral ear and the left as the contralateral ear.
The acoustic stimuli (spoken digits or SSN sounds, and the nSFOAE-evoking sounds) were presented and the nSFOAE responses were digitized simultaneously using a National Instruments sound board (PCI-MIO-16XE-10) installed in a Macintosh G4 computer. Stimulus presentation and nSFOAE recording both were implemented using custom-written LabVIEW® software (National Instruments, Austin, TX). The sampling rate for both input and output was 50 kHz with 16-bit resolution. The stimulus waveforms were passed from the digital-to-analog converter in the sound board to a custom-built lowpass filter/pre-amplifier before being passed to the earphones for presentation. The analog output of the microphone was passed to an Etymotic preamplifier (20 dB gain), and then to a custom-built amplifier/bandpass filter (14 dB gain, filtered from 0.4 – 15.0 kHz), before being passed to the analog-to-digital converter on the sound board.
2.3.1. The nSFOAE procedure
As noted, the physiological measure used here was a nonlinear version of the stimulus-frequency otoacoustic emission (SFOAE). The SFOAE is a perstimulatory OAE emitted by the cochlea throughout the presentation of a long-duration sound, typically a tone (Kemp, 1978, 1980). The cochlear response is a tone of the same frequency as the input stimulus, but it is much weaker and has a time delay. The input tone and the SFOAE tone sum in the ear canal, and the resultant must be processed in some way to extract the cochlear response (Guinan et al., 2003). Our process for estimating the cochlear response relies on a version of the “double-evoked” procedure described by Keefe (1998), and as a consequence, what was extracted is a nonlinear measure of the cochlear response. Accordingly, we call our measure the nSFOAE (Walsh et al., 2010a) to distinguish it from other SFOAE measures.
In the double-evoked procedure, the acoustic stimulus is presented to the same ear three times per trial (a “triplet”), and three measures of the sound in the ear canal are collected. The first two presentations are repetitions of an acoustic stimulus of exactly the same frequency content, level, duration, and starting phase. In our hands, these two presentations were made using different ER-2 earphones, each calibrated separately after placement in the ear canal. For the third stimulus of each triplet, the two earphones present simultaneously the same exact acoustic stimuli that they previously had presented sequentially. Accordingly, the third stimulus waveform was a more-or-less perfect sum of the first two waveforms, and, here, its level was essentially 6 dB greater than that of either of the first two stimulus presentations.
Recordings of the sound in the ear canal were collected during all three presentations comprising each triplet. As noted, these recordings are the sum of the input stimulus and any cochlear response made to that stimulus, plus any extraneous sounds from subject movement, breathing, or other physiological or ambient noise. In order to extract the nSFOAE, the first and second recordings of a triplet were summed, and the third recording was subtracted from this sum. If only linear processes were operating to produce the individual cochlear responses contained in each of the three recordings, and if the recording system itself was linear, the result of this subtraction would have been near-perfect cancellation – specifically, a resultant waveform whose magnitude was not discriminable from that of the physical noise floor of our measurement system. In fact, when our procedures and equipment were used to “stimulate” a passive cavity (see below), near-perfect cancellation did occur; however, in a human ear canal, the result of the double-evoked subtraction always was a residual waveform whose magnitude did exceed the noise floor. This is the nSFOAE response. The nSFOAE also could be called the residual from linear prediction or the residual from additivity. The nSFOAE exists, in part, because as stimulus level increases, the cochlear response grows more slowly than linear additivity (Bacon, Fay, & Popper, 2004). One way of thinking about the nSFOAE is that it represents the amount by which the amplitude of the sum of the first two recordings exceeds the amplitude of the third triplet recording. A stable estimate of the nSFOAE was achieved by averaging the residual waveform obtained from each triplet across multiple triplets in the same block of trials. A primary strength of the double-evoked procedure is that it eliminates the physical stimulus from the residual response.
The stimulus used here to evoke the nSFOAE always was a long-duration tone presented in wideband noise. The tone was 4.0 kHz, 300 ms in duration, and 60 dB SPL in level. The noise had a bandwidth of 0.1 – 6.0 kHz, was 250 ms in duration, and had an overall level of about 62.7 dB SPL (a spectrum level of about 25 dB, so the signal-to-noise ratio at 4.0 kHz was about 35 dB). The onset of the tone always preceded the onset of the noise by 50 ms. The tone was gated using a 5-ms cosine-squared rise and decay, and the noise was gated using a 2-ms cosine-squared rise and decay. The same random sample of noise was used across all presentations of a triplet, across all triplets, across all conditions, and across all subjects. Consecutive nSFOAE stimuli always were separated by a 500-ms ISI (during which digits were spoken by the two talkers). As noted, the nSFOAE responses obtained during the presentations of the tone and noise (the perstimulatory responses) will be described elsewhere; here and in the companion paper on visual attention (Walsh et al., 2014), the emphasis will be on the nSFOAEs measured during brief silent periods that followed each nSFOAE-evoking stimulus. Specifically, following the simultaneous offset of tone and noise for each presentation was a 30-ms silent period from which a measure of the physiological noise in the ear canal was extracted for each triplet.
The above description of the double-evoked procedure and the resulting nSFOAE response is accurate and necessary to understanding the data presented here. Every trial of every block was analyzed as described above, and that includes the 30-ms silent periods that ended each stimulus presentation. Note that all that was saved for later analysis from each block of trials were the difference responses accumulated across trials for each triplet, not the accumulated responses to each of the three presentations for each triplet.
The double-evoked procedure is not a first choice for measuring cochlear responses during periods of silence because SFOAEs and nSFOAEs are, by definition, perstimulatory responses, and when there is no tone or noise-band present, the double-evoked procedure is unnecessarily indirect. (There were no acoustic stimuli during the silent periods, so there were no SFOAEs behaving nonlinearly across stimulus levels, so the rationale for subtracting two summed responses from a double-amplitude response was gone.) Once any aftereffects of the nSFOAE-evoking stimuli had died out (see below), the double-evoked subtraction of responses obtained during the silent period (likely) was tantamount to summing two independent samples of noise and subtracting a third independent sample of noise of the same approximate magnitude; that is, summing three independent samples of noise. A simpler procedure would have been to accumulate and store separately all three responses of each triplet, not just the difference responses trial-by-trial. However, when this study was designed, we did not anticipate that the silent periods would yield interesting attentional effects; they were included only for calibration purposes. As noted, our primary interest initially was in the perstimulatory responses, for which the double-evoked procedure was an appropriate choice, and, as will be shown below, that procedure did yield measures that differed across attentional conditions during the perstimulatory periods, just as for the silent periods—experimental differences that could not have been created by the double-evoked procedure itself.
2.4. Behavior and physiology
2.4.1. Selective auditory-attention conditions
For the auditory-attention conditions, the stimuli used to evoke the nSFOAE responses were interleaved either with the speech stimuli used for the dichotic- and diotic-listening conditions, or with the SSN stimuli used for the inattention condition. To illustrate this arrangement, Fig. 2 shows an example of all the acoustic waveforms presented to the ears during a single trial of the dichotic-listening condition. The trace at the top of Fig. 2 contains the attended speech series (female talker) plus the interleaved nSFOAE-evoking stimuli, and the bottom trace contains the unattended speech series (male talker) plus the identical interleaved nSFOAE-evoking stimuli. On every trial, two triplets were presented consecutively; thus, the third and sixth nSFOAE-evoking stimuli were twice the amplitude of the other nSFOAE-evoking stimuli in the series. Although nSFOAE responses always were measured from the right ear only, the same nSFOAE-evoking stimulus series also was presented to the left ear on every trial. Recall that on one-half of the dichotic-listening trials the subject was listening to the female voice in the left (contralateral) ear while the nSFOAE response was measured in the right ear, whereas in the diotic-listening condition, the same speech stimuli and the nSFOAE-evoking stimuli were presented simultaneously to both ears. Fig. 2 also illustrates that the speech stimuli (about 50 dB SPL each) were weak relative to the nSFOAE stimuli (about 60 dB SPL each). Thus, in order to perform the dichotic- and diotic-listening tasks, subjects had to attend selectively to relatively weak speech sounds in the gaps between relatively strong bursts of tone-plus-noise. Note again, that the physiological responses of interest in this report are those collected during the 30-ms silent periods at the end of each nSFOAE-evoking stimulus (the small open rectangles in Fig. 2).
2.4.2. Physiological-noise measure
The specific procedures used to obtain our physiological-noise measure are shown in Fig. 3. For each block of trials, the responses from about 20 – 30 trials having correct behavioral responses were sorted and averaged for each of the two triplets per trial. An example of an averaged, unfiltered nSFOAE response obtained for one of the triplets is shown at the top of the figure. The initial 50 ms of the perstimulatory waveform is the response to the 4.0-kHz tone presented alone, and the final 250 ms is the response to that same tone presented in wideband noise. Immediately following the nSFOAE response was the 30-ms silent period, delineated here by an open rectangle, from which our physiological-noise measures were obtained. Following data collection, each average nSFOAE response was analyzed offline by passing the 330-ms raw waveform through a succession of 10-ms rectangular windows, beginning at the onset of the response to the tone in quiet and continuing in 1-ms increments until the end of the silent period. At each step, the 10-ms waveform segment was bandpass filtered at some center frequency (typically between 3.8 – 4.2 kHz because the tonal signal used during the perstimulatory period to elicit the nSFOAE was 4.0 kHz) using a 6th-order, elliptical digital filter, the rms amplitude of that filtered waveform was calculated, and the result was converted to decibels sound-pressure level (dB SPL). As illustrated at the bottom of Fig. 3, here we will emphasize the final succession of 10 windows during the silent period, called the asymptotic physiological response (enclosed by the long rectangle). To estimate these asymptotic values, the sound-pressure levels from the last 10 available analysis windows were averaged (the levels of the 10 windows were averaged; the individual responses were not pooled). The decline in magnitude of the physiological response during the first few milliseconds of the silent period is discussed below.
2.4.3. Off-frequency measures
Once the initial analyses had been made at 4.0 kHz, it was clear that analyses at other frequencies would be informative. For consistency, it was desirable to conduct these additional analyses using the same 10% bandwidth used at 4.0 kHz, but this raised a problem. The rise time of the digital filter would be a different fraction of our 10-ms analysis window at the different center frequencies, meaning that the true levels of the physiological noise would be increasingly underestimated the lower the frequency. Our solution was to estimate correction factors for each center frequency examined. Specifically, for each frequency of interest, 10-ms sample was obtained from a steady-state pure tone using the same procedures, software, and 6th-order elliptical filter set for a 10% bandwidth, as were used to analyze the physiological responses. The rms output of the filter then was compared with the actual rms of the input waveform, and a correction factor was calculated in decibels. These corrections ranged from 21.5 dB at 1.1 kHz to 1.5 dB at 7 kHz (2.46 dB at 4.0 kHz). All noise levels reported in this and the companion paper (Walsh et al., 2014) have been adjusted by these frequency-dependent correction factors; the values in Walsh et al. (2010a, 2010b) were not so corrected. Note that these adjustments had no effects on whatever differences existed between the physiological responses obtained from different attention conditions.
2.5. Data analysis
Although not mentioned previously, data collection and analysis were slightly different for the three experimental conditions. For the diotic-listening condition, the physiological responses that satisfied the criteria for acceptance (see Appendix) were pooled across all trials in that block having the same behavioral response. That is, at the end of each block of trials, there were four physiological measures: an averaged physiological response for all the trials having correct behavioral responses, all those having incorrect behavioral responses, and each of those separately for triplets 1 and 2. However, for the dichotic-listening and inattention conditions, there were eight physiological measures at the end of each block of trials. The reason is that the four measures just described for the diotic condition were kept separately for trials on which the female voice (or female-derived speech-shaped sounds) was in the right ear (which had the recording microphone; the ipsilateral ear) or the left (contralateral) ear. The purpose of this procedure was to allow a test of the logical possibility that the amount of efferent activity differs in ears having, and not having, the targeted, female voice. However, the use of this procedure meant that the number of individual waveforms contributing to the physiological response in the diotic condition was about double that in the dichotic and inattention conditions. A solution emerged when we found no systematic difference in the physiological responses from the ipsilateral and contralateral ears, within either subjects or conditions (an unexpected result). This allowed us to sum the raw contralateral responses with the raw ipsilateral responses for the dichotic and inattention conditions (still using only behaviorally correct trials), which essentially equated the number of individual trials for all conditions.
Data were averaged in another way for another purpose. Namely, each subject completed each of the experimental conditions at least 4 times (M = 4.6). The averaged response waveform from each block was passed through the moving-window filter analysis and the values obtained were converted to dB SPL. The resulting moving-window analyses were pooled (averaged) to yield a single estimate of the physiological noise for each subject and each condition. In order to estimate the asymptotic level of the response at the end of the silent period (see Fig. 3), the levels in the final 10 analysis windows of the pooled response were averaged. The end result was similar numbers of individual trials contributing to the averaged responses for all three experimental conditions for each subject – typically between 80 and 120 trials (all having correct behavioral responses).
2.5.1. Statistical measures
Physiological-noise magnitudes from the selective-attention and inattention conditions were compared using analyses of variance (ANOVA), and measures of effect size (d; see Cohen, 1992). Here, effect size is the difference between the means of two distributions of data divided by an estimate of the pooled standard deviation across the two distributions (see Eq. 1). By convention, effect sizes between 0.20 – 0.50 are considered to be small, effect sizes between 0.50 – 0.80 are considered to be medium, and effect sizes greater than 0.80 are considered to be large (Cohen, 1992).
Eq. 1 |
3. RESULTS
Behaviorally, subjects performed well on the digit-recognition task in the selective-attention conditions, indicating that they were attending reliably to the correct stimuli. This is an important outcome to consider when interpreting the differences between physiological-noise levels in the attention and inattention conditions. In the two auditory-attention conditions, subjects performed at 86.0% correct on average (range = 73.0% – 98.5%). There was no statistical difference in behavioral performance across the dichotic- and diotic-listening conditions.
Although the emphasis in this paper is on the physiological responses during the silent periods, there is value in providing the reader with some information about the general pattern of response seen during the preceding perstimulatory period. Accordingly, Fig. 4 shows the nSFOAE response for one subject across the full time course of a stimulus presentation, using the analysis procedure just described. As can be seen, there is a small and essentially constant nSFOAE during the 50 ms of tone-alone, a short hesitation at the onset of the wideband noise, a rising, dynamic response lasting about 100 ms, and then an apparently asymptotic response lasting throughout the course of the tone-plus-noise. This is the same response pattern reported in Walsh et al. (2010a) to essentially the same stimuli. In accord with Guinan (e.g., Guinan et al., 2003), the interpretation is that tone-alone was not effective in triggering an efferent response, but the wideband noise was; the efferent response takes about 100 ms to reach its maximum; it then remains essentially constant throughout the course of the activating sound (and it persists for several hundred milliseconds thereafter). What is new in this figure is the difference in the nSFOAE magnitudes for the inattention and attention conditions.
Fig. 4 also shows the focus of the current paper, the silent period. As shown, physiological responses still could be measured after the offset of the tone-plus-noise, and those responses still showed a difference for these two attentional conditions. The responses during the silent period were invariably smaller in the selective-attention conditions than in the inattention conditions, and this was true for every subject and for both the auditory- and visual-attention tasks (see Walsh et al., 2014). Before showing the comparisons between the attention and inattention conditions in detail for the silent period, we describe several equivalences in the data: between ipsilateral and contralateral measurements, between the two triplets, and between dichotic- and diotic-listening conditions.
3.1. Comparison of ipsilateral and contralateral measures
The physiological-noise levels measured in the right ear were essentially the same whether the female voice being attended to was in the right or the left ear—the ipsilateral and contralateral trials, respectively. The data for both triplets of the dichotic-listening condition are shown in Table 1. To review, each entry is based on the following: for each block of trials, each subject provided one averaged physiological response for ipsilateral trials on which the behavioral response was correct, and one averaged physiological response for contralateral trials on which the behavioral response was correct—for each triplet. For each of those responses, the levels in the 10 analysis windows beginning 10 ms into the silent period were averaged window-by-window, and those values were averaged with the levels obtained from at least three other blocks of trials of the same condition. These results we call the asymptotic levels of the physiological noise. Larger negative values indicate a smaller physiological-noise measurement – a quieter recording.
Table 1.
Triplet 1 | Triplet 2 | |||
---|---|---|---|---|
Subject | Ipsilateral | Contralateral | Ipsilateral | Contralateral |
L01 | −9.5 | −10.6 | −10.1 | −9.0 |
L02 | −10.1 | −10.1 | −9.4 | −9.8 |
L03 | −11.2 | −10.8 | −10.8 | −11.4 |
L04 | −10.8 | −11.1 | −10.5 | −11.0 |
L05 | −11.4 | −11.0 | −9.7 | −11.7 |
L06 | −10.3 | −11.4 | −12.3 | −8.9 |
L07 | −8.6 | −7.9 | −8.2 | −10.4 |
L08 | −10.8 | −10.7 | −10.6 | −10.9 |
| ||||
Mean | −10.3 | −10.4 | −10.2 | −10.4 |
Std. Dev. | 0.9 | 1.1 | 1.2 | 1.1 |
Effect Size (Ipsilateral - Contralateral) |
0.1 | 0.2 |
Each entry is a window-by-window mean first across the final ten 10-ms windows of the silent period and then across at least four 30-trial blocks, corresponding to about 80-120 trials (correct only) averaged for each physiological response.
As the means and effect sizes at the bottom of Table 1 reveal, the asymptotic levels were essentially identical for the ipsilateral and contralateral measures for both triplets. This was an unexpected outcome because, for us, the wiring of the olivocochlear system (Brown, 2011) always has suggested that “ear-switching” is a likely function of efferent activation (e.g., Cherry, 1953). The data in Table 1 also reveal that there were no differences in the physiological noise levels across the two triplets.
For the inattention condition, just as for the dichotic-listening condition, the physiological responses from the right ear were averaged separately depending upon whether the SSN stimuli based on the female-spoken digits were presented to the right or left ear. This was done even though the subjects were unaware that the SSN stimuli in one ear were based on the female voice and those in the other ear on the male voice, nor were they required to attend to those SSN stimuli. The results were similar to those in Table 1; the means calculated either across ears or triplets differed by less than 1 dB. (No systematic differences between the ipsilateral and contralateral measures, nor the two triplets were seen in the visual-attention data either; Walsh et al., 2014). Accordingly, we conclude that the ipsilateral and contralateral measures from the silent period are essentially equivalent, as are the triplet 1 and triplet 2 measures. So, in all of the analyses reported below, the ipsilateral and contralateral data are averaged, within subjects and window-by-window, as a way to improve the reliability of our measures. Also, for simplicity, often only data for triplet 1 are presented.
3.2. Comparison of diotic and dichotic conditions
The two auditory-attention conditions in this study involved either dichotic or diotic stimulus presentations. Phenomenologically (for the authors, at least, if not for the highly practiced subjects), the dichotic condition was easier because the female and male voices originated from different spatial locations. However, the behavioral data were not systematically different for those two listening conditions. Were the physiological-noise measures also similar for the dichotic and diotic conditions? The short answer is yes; there were no systematic differences between the dichotic and diotic auditory-attention conditions. Accordingly, for the remainder of the Results section, only the dichotic data are presented. The details of the comparison between the diotic- and dichotic-attention conditions are in the Appendix (section 6.2).
3.3. Comparison of inattention and attention conditions
This brings us to the central question motivating this research: Were the physiological-noise levels different when subjects were attending, or not attending, to the spoken digits? The answer is yes. In Table 2 we show the window-by-window averages for the ten analysis windows beginning 10 ms into the silent period (the asymptotic values) for both the inattention and dichotic-attention conditions, and for both triplets. Examination of Table 2 reveals that for every subject, and for both triplets, the physiological-noise magnitudes always were larger (noisier) in the inattention condition, and smaller (quieter) in the dichotic-listening condition. The effect sizes for the differences between inattention and dichotic attention were greater than 2.0 for both triplets. A two-way univariate ANOVA, with experimental condition and triplet as the two factors, revealed a significant main effect of condition on the average noise magnitudes [F(1, 28) = 45.8, p < 0.0001], but the main effect of triplet was not significant [F(1, 28) = 1.2, p = 0.3], nor was the interaction of condition and triplet [F(1, 28) = 1.1, p = 0.3]. Again, similar results were obtained for the diotic-attention condition.
Table 2.
Triplet 1 | Triplet 2 | |||
---|---|---|---|---|
Subject | Inattention | Dichotic | Inattention | Dichotic |
L01 | −6.2 | −10.0 | −8.9 | −9.5 |
L02 | −8.8 | −10.1 | −8.3 | −9.6 |
L03 | −6.7 | −11.0 | −6.3 | −11.1 |
L04 | −8.6 | −11.0 | −10.0 | −10.7 |
L05 | −8.4 | −11.2 | −9.0 | −10.7 |
L06 | −8.5 | −10.9 | −9.0 | −10.6 |
L07 | −5.6 | −8.3 | −7.0 | −9.3 |
L08 | −7.6 | −10.8 | −8.2 | −10.7 |
| ||||
Mean | −7.5 | −10.4 | −8.3 | −10.3 |
Std. Dev. | 1.2 | 1.0 | 1.2 | 0.7 |
Effect Size (Inattention - Dichotic) |
2.6 | 2.2 |
Each entry is a window-by-window mean first across the final ten 10-ms windows of the silent period and then across at least four 30-trial blocks, corresponding to about 80 – 120 trials (correct only) averaged for each physiological response.
Our interpretation (elaborated below) is that the cortico-olivo and medial olivocochlear components of the efferent system were more strongly activated during the selective-attention conditions than during the inattention condition, and that those differences in activation persisted into and throughout the silent period (see Backus and Guinan, 2006; Walsh et al. 2010a, 2010b).1
3.4. Supplementary analyses
3.4.1. Physical-noise measures
In order to test the possibility that the observed differences in the magnitudes of physiological noise were due to an unappreciated procedural difference across the three conditions, an additional calibration was conducted. The acoustic stimuli were played to, and recorded from, a syringe (a non-human, passive cavity) whose volume was approximately 0.5 cm3, using exactly the same equipment, software, and procedures as was used with the human subjects. Responses were collected for three blocks of trials for each (human) condition of listening. The data from these physical-noise recordings were averaged and analyzed in the same way as the physiological-noise recordings. The asymptotic “responses” for triplet 1 from the silent periods are shown on the right side of Fig. 5. Across the experimental conditions, the average physical-noise magnitudes were highly similar to one another (averaging about −15.7 dB at 4.0 kHz). For comparison, the physiological-noise data from triplet 1 are shown on the left side of Fig. 5. The strong implication is that the differences in the human data observed in the attention and inattention conditions were not attributable to inconsistencies or artifacts in the software or procedures used.
3.4.2. Initial decline
The data in Table 2 and Fig. 5 were obtained by averaging the levels from the last ten 10-ms windows at the end of the 30-ms silent period, and thus represent an asymptotic level of the physiological noise in the ear canal. For some subjects, the physiological responses at the beginning of the silent period were stronger than at asymptote, and they underwent a decline during the first few milliseconds of the silent period. For other subjects there was little or no decline. Because averaging the responses across all subjects would have misrepresented the individual data, we partitioned the subjects into two groups prior to averaging. The top panel of Fig. 6 shows across-subject averages for the three subjects exhibiting little or no decline in response magnitude during the silent period, and the bottom panel of Fig. 6 shows the same for the five subjects who did exhibit a decline. The averages across subjects shown in each panel are for each of the twenty 10-ms windows spanning the entire 30-ms silent period. The dashed lines in the bottom panel show the levels of physical noise measured in a passive cavity for each condition. Only triplet 1 is illustrated because the data from triplet 2 were similar.
We believe that the ears exhibiting decline were emitting echo-like responses to the tone-plus-noise stimuli that ended just before the silent periods. That is, we believe the declines represent decaying nSFOAEs after stimulus offset. All the subjects exhibited strong perstimulatory nSFOAE responses, but for the subgroup in the top panel of Fig. 6, those responses had declined to the asymptotic physiological noise floor by the end of the 5-ms decay of the tone-plus-noise stimulus, especially in the inattention condition. For both groups, more decline was evident for the dichotic-attention condition. Note that the differences between the inattention and dichotic-attention conditions present in the final moments of the 30-ms silent period (Fig. 5 and Table 2) also were present, or were beginning to emerge, soon after the onset of the silent period. When the asymptotic noise power was subtracted out for each condition, the difference between inattention and dichotic-attention still was present for both groups at the beginning of the silent period (not shown), and this difference was about 3 dB for both groups of subjects.
3.4.3. Correlations during initial decline
If the early portions of the noise-like waveforms we obtained during the silent periods were, in part, after-effects of the tone-plus-noise stimuli preceding them (a decaying SFOAE), then the fine structure of the averaged responses obtained at similar times post-stimulus ought to be similar for triplets 1 and 2. To test this implication, we calculated within-subject correlation coefficients for pairs of filtered, averaged responses beginning at corresponding moments during the silent periods from triplets 1 and 2 – namely 10-ms segments beginning 1 or 2 or 3, etc., ms after the offset of the tone-plus-noise. The calculations were averaged separately within the same two groups of subjects as used in the previous section—those who did, or did not, exhibit a decline in physiological-noise level during the first few milliseconds of the silent period. The results were qualitatively the same for the two groups. The correlations between the fine structures of the corresponding averaged responses from the two triplets were 0.8 or greater for the first 10-ms analysis window of the silent period and then gradually declined with successive advancements of the window. The primary difference between the two groups of subjects was that the five subjects showing an initial decline in physiological-noise level had correlations that stayed high longer as the analysis window was advanced into the silent period. Specifically, when the correlations had fallen to about 0.0 for the no-decline group (at about 307 ms), they still were about 0.4 for the with-decline group. For both groups, the declines in correlation obtained were similar for both the attention and inattention conditions. All these outcomes were obtained with the analysis filter centered at 4.0 kHz, but a similar pattern of results was obtained when the analysis filter was moved to other frequencies. These results support the interpretation that the energy seen during the first few milliseconds of the silent period consisted largely of decaying SFOAEs evoked by the tone-plus-noise stimulus, and that the energy seen during the final milliseconds was random in nature (i.e., noise).
3.4.4. Off-frequency measurements
In addition to measuring physiological-noise magnitudes at the frequency of the 4.0-kHz tone used to elicit the nSFOAE, noise measures also were obtained at a number of other frequencies. The specific frequencies were selected for different reasons; some represented peaks or valleys in the spectrum of the last 20 ms of the noise sample used, while others were chosen to fill voids in the spectral set. In the end, the outcomes did not differ according to the original basis for choice of the individual frequencies.
For each subject and each condition, the averaged waveform for triplet 1 of every trial was bandpass filtered symmetrically around each selected frequency, using a filter whose bandwidth was 10% of that frequency. A correction factor then was applied to the magnitude at each center frequency to account for the differential rise times of the different digital filters. Physiological-noise magnitudes at each frequency were calculated by averaging the last ten available data points in the silent period (from 310 – 319 ms), just as was done for the measures at 4.0 kHz. In Fig. 7, the physiological-noise magnitudes averaged across subjects in the inattention and dichotic-listening conditions are plotted as a function of frequency. The symbols for the two experimental conditions are the same as those used in Fig. 6 above.
The data in Fig. 7 show that noise magnitudes were largest at the lowest and highest frequencies tested, and were smaller at intermediate frequencies; the smallest noise magnitudes were measured between about 2 – 4 kHz. At every frequency selected, physiological-noise magnitudes were higher in the inattention condition and lower in the dichotic-listening condition—the same pattern of results that was observed at 4.0 kHz (those data are shown again in Fig. 7 for comparison). Interestingly, the noise magnitudes at 4.0 kHz—the frequency of the probe tone used to measure the nSFOAE during the perstimulatory period—were not noticeably different from those at neighboring frequencies. Measurements at the two highest frequencies shown in Fig. 7 (6.5 and 7.0 kHz) were made outside the passband of the frozen sample of wideband noise that was presented simultaneously with the probe tone, yet the outcomes were essentially the same. These data reveal that the mechanism responsible for the marked differences across our attention conditions operated across a wide spectral region.
In addition to the human data, Fig. 7 contains two lines for purposes of comparison. The dashed line shows the spectrum of the noise measured in a syringe using the same equipment and procedures as used for the human data. (The published frequency response of the Etymotic ER-10A microphone varies less than 3 dB over the entire frequency range shown.) The solid line has a slope of 3 dB per octave, which is the rise expected in the human and syringe measurements attributable to the use of the 10% filter bandwidth when analyzing the data. Note that the larger differences between the human and syringe data at low frequencies than at high frequencies in Fig. 7 likely are attributable to factors such as breathing, swallowing, and other essential sounds in the human subjects.
The physiological-noise magnitudes in Fig. 7 were compared across the inattention and dichotic-attention conditions using a two-way univariate ANOVA, with experimental condition and center frequency (of the bandpass filter) as the two factors. Significant main effects were revealed for both condition [F(1, 168) = 54.4, p < 0.0001] and center frequency [F(11, 168) = 75.5, p < 0.0001], but not for the interaction of the two factors [F(11, 168) = 0.2, p = 1.0].
In passing, we note that it probably is incorrect to assume that the noise measurements shown for the humans at each frequency in Fig. 7 originated from single, separate “characteristic places” along the basilar membrane. Rather, it is likely that the level measured at each frequency represents a sum across a number of reflection sites in the general vicinity of those characteristic places (see Shera, 2003).
3.4.5. Initial declines across frequency
The rates of decline of the physiological responses (Fig. 6) also were measured at the frequencies shown in Fig. 7. Fig. 8 shows noise magnitudes in the inattention and dichotic-attention conditions at four frequencies as a function of time from the start of the 30-ms silent period. The four frequencies shown are the lowest and highest that were tested (1.1 and 7.0 kHz), plus two intermediate frequencies (3.2 and 5.3 kHz). These plots are averages across all eight subjects because here the individual differences in magnitude of decline were smaller than those seen at 4.0 kHz (Fig. 6) (perhaps because at frequencies other than 4.0 kHz, the response more closely reflects the bandwidth of the measurement filter, whereas at 4.0 kHz the response is more nearly tonal). For the data measured at 1.1, 3.2, and 5.3 kHz, just as for 4.0 kHz (Fig. 6), initial declines in noise level were observed across the two conditions. The highest frequency, 7.0 kHz, showed no initial decline, presumably because that frequency was outside the passband of our MOC-eliciting noise.
Four general results emerged from the analysis of the data across multiple frequencies: (1) the time-course of the decline of the physiological response became progressively shorter as frequency was increased, (2) there was no initial decline at frequencies above the passband of the wideband noise, (3) the difference between the inattention and dichotic-attention conditions was smaller at the lower frequencies, and (4) the temporal and asymptotic effects seen at 4.0 kHz were not different from the effects seen at neighboring frequencies, suggesting that the 4.0-kHz tone played no unique role in the effects seen during the silent period. These facts confirm the assumption that the initial declines are attributable to echo-like responses (decaying SFOAEs) to the various frequencies present in the tone-plus-noise stimulus that terminated just prior to the silent period. Presumably the declines took longer at lower frequencies because of longer intra-cochlear travel times, and it may be that the smaller attention effect at low frequencies is attributable to the reduced density of efferent innervation at the apical end of the cochlea (Brown, 2011; Liberman et al., 1990).
3.4.6. Measurements at SOAE frequencies
Prior to data collection in the attention conditions, SOAEs were identified from recordings in the quiet, using our standard procedures (Pasanen and McFadden, 2000). Six of the eight subjects who participated in this experiment had SOAEs in the (right) ear from which our physiological-noise measures always were obtained. To examine whether SOAE magnitudes also were affected by the attentional manipulations, the averaged physiological-noise waveforms from triplet 1 of the inattention and dichotic-attention conditions were filtered around the frequencies of each subject’s two strongest SOAEs. As before, the bandwidth of the filter was 10% of the SOAE frequency, centered on the frequency of the SOAE. The same moving-window analysis procedure described above was used for the SOAE analyses.
The result was that the SOAE magnitudes measured during the silent period also were stronger for the inattention condition than for the dichotic condition (except for one SOAE for one subject where the difference was reversed by 0.3 dB). The average difference between the inattention and dichotic-attention conditions over the final ten windows of the silent period was 1.4 dB (compared to about 2.8 dB in Table 2 for all eight subjects and triplet 1 at 4.0 kHz). These are the results for triplet 1; this attentional effect on SOAEs was smaller for triplet 2. A simple explanation is that, during the perstimulatory period, cortico-olivo efferent activity was stronger in the attention condition than in the inattention condition, and those different levels of efferent activity then persisted through the silent period. Note that, during the perstimulatory period, there would have been two contributions to the overall efferent inhibition acting on the SOAEs: the reflexive component triggered by the wideband noise (Guinan, 2006; Guinan et al., 2003; Walsh et al., 2010a, 2010b) plus whatever modulation of the reflexive component existed because of selective attention.
Between-triplet correlations were calculated between the fine structures of the averaged responses obtained at SOAE frequencies, just as was reported for the averaged responses at 4.0 kHz (section 3.4.3). Unlike the systematically declining correlations observed through the silent period at 4.0 kHz, the between-triplet correlations between responses from corresponding analysis windows at SOAE frequencies remained at about 0.8 – 0.9 over the entire 30-ms silent period. This outcome suggests that SOAEs were present immediately after the end of the tone-plus-noise stimulus used to measure the physiological response, and further, the SOAEs apparently were being synchronized by the stimuli used (e.g., Wilson, 1980; Wit and Ritsma, 1979), as evidenced by the high running correlations between the fine structures of the averaged responses.
4. SUMMARY
A tone-plus-noise stimulus was used to activate the MOC efferent system and to obtain a nonlinear measure of cochlear response (the nSFOAE) during both the perstimulatory period and a 30-ms silent period following the tone-plus-noise stimulus. In some conditions, the subjects needed to attend to auditory stimuli in order to perform a behavioral task; in other conditions, they had no reason to attend.
The physiological noise recorded in the ear canals of our human subjects was substantially weaker during behavioral tasks requiring selective auditory attention than during a task involving relative inattention. This was true for all subjects. The magnitude of the difference was about 2 – 3 dB, which corresponded to effect sizes larger than 2.0. The implication is that the cortico-olivo component of the efferent system was more active, and therefore MOC efferent activity was greater, when selective attention was required to complete the behavioral task.
The magnitudes of the asymptotic physiological-noise responses were essentially identical for the dichotic- and diotic-attention conditions. Behavioral performance in those conditions also did not differ.
In the dichotic-attention task, the magnitude of the asymptotic physiological response in the right (ipsilateral) ear was essentially the same whether that ear contained the attended (female) voice or the non-attended (male) voice. That is, there was no evidence of the efferent system acting to implement switching between the ears in a condition where it might have been expected.
The physiological responses exhibited an initial period of decline in magnitude before reaching their asymptotic levels, presumably because of echo-like responses (decaying SFOAEs) to the tone-plus-noise stimuli that terminated just prior to the silent periods. The physiological responses during the inattention condition generally were greater than those during the dichotic and diotic conditions during this initial period of decline as well as during the remainder of the 30-ms silent period. This confirms that the differences across experimental conditions were attributable to persistence of the MOC activity present during the tone-plus-noise stimuli into the silent period.
Cross correlations for pairs of filtered, averaged responses from triplets 1 and 2 of the same block of trials were high (0.8 – 0.9) for the first several milliseconds of the silent period, and then declined to values near zero as asymptotic noise magnitudes were reached. This suggests both that the decline in physiological-response magnitude is attributable to a decline in the echo-like responses to the acoustic stimulus and that, for the asymptotic measures, the double-evoked procedure was essentially summing uncorrelated samples of noise.
Physiological-noise magnitudes measured at different frequencies in the spectrum also differed according to the attentional demands of the behavioral task, just as was observed at 4.0 kHz. In other words, the presence of the 4.0-kHz probe tone was not necessary to observe the effects of attention on cochlear noise. The time course of decay in the silent period was slightly longer, and the magnitude of decay greater, at low frequencies than at high frequencies.
The physiological-noise levels measured at SOAE frequencies in individual ears typically also were larger during inattention than during attention, suggesting that the efferent system was inhibiting the mechanisms underlying both SOAEs and SFOAEs. When cross correlations between triplets were examined at SOAE frequencies, the values remained high across the entire duration of the silent period. The strong implication is that the SOAEs were being synchronized by the nSFOAE-evoking stimuli.
Because behavioral responses of the same type (a key press) occurred at the end of every trial during both the inattention and attention conditions, the various differences between conditions cannot be attributed to the absence of similar motor responses in one condition, as was logically possible in past studies (Puel et al., 1988; Avan and Bonfils, 1992; Meric and Collet, 1992, 1994b; Froehlich et al., 1993; Ferber-Viart, 1995). The nature of the behavioral task can affect the results when attention is manipulated; our behavioral task was more similar to the identification tasks than the detection tasks used by others (e.g., Hafter et al., 1998; Gallun et al., 2007)
5. DISCUSSION
We attribute the observed differences in physiological-noise magnitudes between the inattention and attention conditions to different levels of activation of the medial olivocochlear bundle (MOCB) associated with the differing attentional demands of the behavioral tasks. Just as the superior olivary complex (SOC) sends efferent projections into the cochlea, the SOC receives efferent projections from the inferior colliculus in the brainstem, as well as direct projections from auditory cortex (Mulders and Robertson, 2000a, 2000b). The very existence of these latter connections suggests the possibility that cognitive processes originating in auditory cortex can affect the processing of the sounds upon which those cognitive processes are being based. The MOCB generally is viewed to be the primary neural pathway through which attention can modulate cochlear activity.
We recognize that our nonlinear measure is not capable of distinguishing between noise that originates in the cochlea and noise that originates in the middle- or outer-ear cavities. Here we have presumed that the preponderance of what we measured during our 30-ms silent intervals originated from the cochlea largely because it is difficult for us to understand how attentional demand could alter the contributions from the middle or outer ears (details are below).
5.1. Interpreting the double-evoked measure
If, as argued above, the application of the double-evoked procedure to the responses measured during the 30-ms silent periods was tantamount to summing three independent samples of noise, then how is it that our estimates of that physiological noise were different depending upon the attentional demands of the behavioral task? In order to understand the answer, it is necessary to understand what we believe was happening prior to the silent periods, during the perstimulatory segments of each trial. Recall that each of the three nSFOAE-evoking stimuli presented during every triplet consisted of 50 ms of 4.0-kHz tone-alone, followed by 250 ms of tone-plus-wideband-noise, followed by the 30 ms of silence that was emphasized here. Guinan et al. (2003) have shown that single tones of moderate level are not particularly effective at activating the MOC efferent reflex, but wideband noises are, especially when they are binaural (Guinan et al., 2003). In accord with Guinan’s reports and previous work in this laboratory (Walsh et al., 2010a, 2010b), the differencing manipulation required by the double-evoked procedure revealed the averaged, filtered nSFOAE response during the tone-alone period to be weak and essentially constant throughout the 50-ms duration of tone-alone (see Fig. 4). [The nonlinearity responsible for this nSFOAE response either stems from mechanisms other than the MOC system or it depends on some component of the MOC system having an extremely long time-constant (Backus and Guinan, 2006; Cooper and Guinan, 2003; Guinan, 2006; Sridhar et al., 1995).] Then, following the onset of the wideband noise, and the concomitant activation of the MOC efferent reflex (Guinan, 2006, 2011; Guinan et al., 2003), the nSFOAE response showed a rising, dynamic segment that took about 100 – 150 ms to reach asymptote (see Walsh et al., 2010a, Fig. 1; cf., Backus and Guinan, 2006), after which the response remained essentially constant throughout the remainder of the tone-plus-noise stimulus. Although MOC activation by the wideband noise is reflexive, its magnitude apparently differs depending upon the attention condition, meaning that the magnitude of the nSFOAE response at the end of the tone-plus-noise segment also differed depending upon the level of attention required. Specifically, we believe that, during the perstimulatory period, the reflexive MOC efferent response to the wideband noise was augmented by a cortico-olivo efferent response such that the overall efferent effect was greater during the attention conditions than during the inattention condition (those perstimulatory data will be reported more fully elsewhere). What is missing from this interpretation of the events during the perstimulatory period is why there also should be differences observed during the silent period following the perstimulatory period.
As previously demonstrated (Walsh et al., 2010a, Fig. 9; Walsh et al., 2010b, Fig. 6; Goodman and Keefe, 2006; Backus and Guinan, 2006), the reflexive efferent response activated by the wideband noise does not recover immediately upon the termination of the wideband noise, but rather it can persist unabated for several hundred milliseconds after noise offset. It follows that the level of the overall efferent activity at the onset of the silent periods (and throughout their short time course), was greater in the attention conditions than in the inattention condition. Accordingly, any measure obtained from the cochlea during the silent period that can be affected by efferent activity should have been different depending on the subjects’ level of attention. Specifically, we suggest that the decaying SFOAEs seen at the beginning of the silent period (Figs. 6 and 8), the nSFOAEs measured at SOAE frequencies, and the physiological-noise measurements obtained during the latter part of the silent period (Figs. 5 and 7), all were weaker during the attention conditions than the inattention conditions because the persisting efferent activity was greater during the attention conditions than the inattention conditions.
The physiological noise measured during the latter part of the silent period has, to our knowledge, not been reported previously. What is the origin of this noise? There may be multiple sources. Over the years, investigators have discussed noise processes, particularly Brownian motion, in various cochlear elements (e.g., de Vries, 1948; Corey and Hudspeth, 1983; Harris, 1968; Svrcek-Seiler et al., 1998), and some have discussed the potential benefits to detection afforded by stochastic resonance (e.g., Jaramillo and Wiesenfeld, 1998). Most relevant here, of course, are those mechanisms that lead eventually to acoustical noise, not just molecular, receptor, or neural noise. Rather than discuss the merits of various possible mechanisms, we will summarize what we know about the characteristics of the noise we have measured during the silent periods. It is wideband; the physiological noise we measured during the silent period existed over a range of about 1.1 to 7.0 kHz. It was present in every subject tested, and, for every subject, its level covaried with the attention condition at every frequency analyzed. Our physiological-noise function has a U-shape: the largest magnitudes were measured at the lowest and highest frequencies tested. (In contrast, the spectrum of Brownian noise declines significantly per octave from low to high frequencies.) Over the middle and upper frequency range, the spectrum of the physiological noise is highly similar in shape to the spectrum of the noise we measured in a passive cavity using the same procedures and equipment.
We have suggested that we are measuring cochlear noise because we know that cochlear noise exists in healthy ears, that this inherent noise is broadband, and that it can be suppressed by efferent activation. Nuttall et al. (1997) measured basilar membrane (BM) vibration in the guinea-pig cochlea in the absence of sound stimulation. They demonstrated that BM noise was associated with a healthy, sensitive cochlea; BM noise decreased over the duration of surgery, and disappeared completely with the death of the animal. Stimulation of the crossed olivocochlear bundle (COCB) significantly suppressed BM noise by about 10 dB. Nuttall et al. (1997) concluded that BM noise is broadband because the noise function measured in quiet was very similar to the response function to a low-level broadband stimulus. They proposed that thermal noise could be the origin of this energy because the spectrum of thermal noise is approximately flat.
An alternative explanation is that the noise we have measured did not originate in the cochlea, but rather that the observed differences in noise level resulted simply from our subjects sitting more quietly in the attention conditions than in the inattention conditions. That way, there would be more head and body noise recorded in the inattention conditions. However, there is considerable evidence against this suggestion. First, unlike many past studies (Puel et al., 1988; Avan and Bonfils, 1992; Meric and Collet, 1992, 1994b; Froehlich et al., 1993; Ferber-Viart, 1995), our subjects were required to make the same behavioral response on every trial whether it was an inattention block or an attention block. Second, our subjects were given trial-by-trial feedback as to whether their physiological responses were being rejected as too noisy, meaning that they knew when additional trials were being added to the block; this encouraged them to sit quietly. Third, if the subjects were more restless in the inattention condition, one would expect that more trials would be rejected by our acceptance criteria (see Appendix, section 6.1) during the inattention blocks than during attention blocks. In fact, however, the duration of a block of trials was similar in the inattention and attention conditions, suggesting that the levels of extraneous subject noise were similar across conditions: the average block duration for the inattention condition was 270.1 s (SD = 69.1), and for the dichotic condition it was 270.6 s (SD = 79.0). The number of trials in each condition also was very similar: the average number of trials for the inattention condition was 32.3 (SD = 3.6), and for the dichotic condition it was 32.2 (SD = 4.4). Fourth, on every trial we measured the time elapsed from the onset of the behavioral response interval until the subject’s button press – the reaction time or RT. For every subject, the RTs for the inattention task invariably were substantially faster than the RTs for the attention tasks. If our subjects were totally inattentive and moving around restlessly during the inattention blocks, then one would expect the RTs to be slow, not fast. Our subjects were attending to the stimulus sequences during the inattention blocks, they just did not have to remember the sounds for a subsequent cognitive task.
A related possibility is that subjects specifically reduced their cardiac, pulmonary, or muscular noise during the attention blocks. However, this alternative also is an implausible explanation for our data. Ren et al. (1995) showed that the acoustic energy produced by the cardiac cycle is weak, and has spectral components predominantly located below 500 Hz. Similarly, Gavriely et al. (1981) measured the spectral characteristics of the human pulmonary system and showed that the maximal frequency for inhalation and exhalation – the frequency beyond which no acoustic energy was detectable – ranged from 286 – 604 Hz, depending upon where on the chest wall the measurements were made. Finally, acoustic noise resulting from contractions of skeletal muscles in humans has been shown to be below about 25 Hz (Diemont et al., 1988; Oster and Jaffe, 1990). In contrast, the lowest frequency at which we measured noise levels in our subjects was 1074 Hz, and we observed consistent differences across subjects and frequencies up to 7000 Hz.
Another explanation that has been proposed is that the middle-ear reflex (MER) was responsible for our findings. Specifically, perhaps the MER made both forward and reverse transmission less efficient (e.g., Puria, 2003) in the attention conditions. We regard this to be an unlikely explanation for the attention/inattention differences we have obtained. To be sure, the stimulus levels used in this study were in the range where the MER could be activated in some sensitive subjects (Guinan et al., 2003). But to explain the differences in physiological noise measured in our attention and inattention conditions, the MER would have to contract differently in the two conditions even though all of the acoustic stimuli (the speech sounds and the tone-plus-noise used to evoke efferent activity) were identical in the two conditions. We are not aware of any evidence suggesting that the MER is differentially affected depending upon cognitive demands, and even if it were, the expectation would be that the biggest effects would be at low frequencies whereas our attention/inattention differences were about the same across frequency. Finally, note that this MER proposal does seem to assume that the physiological noise we have measured during the silent period has its origin in the cochlea, which we do believe. Although we cannot know for certain whether the MER was activated for some of our subjects in this study, we doubt strongly that the MER was responsible for the differential measurements obtained for attention and inattention conditions. Differential activation of the efferent system is a more parsimonious explanation.
5.1.1. A speculation
While discussing our selective-attention results with colleagues, we were informed of an intriguing mechanism apparently used by the visual system during attention (W.S. Geisler III, personal communication, 2012). Mitchell et al. (2009) showed that there are low-frequency (< 5 Hz) fluctuations in firing rate in neurons located in the V4 region of visual cortex in rhesus monkey. Furthermore, those low-frequency fluctuations in rate are ordinarily correlated across neurons. Then, when attention is demanded of the monkey to accomplish a discrimination task, the low-frequency fluctuations cease being correlated. The interpretation of these findings is that, when information is pooled across populations of neurons sensitive to similar aspects of the attended stimuli, the effective signal-to-noise ratio is improved by the decorrelation in the firings. This finding led us to consider whether a similar mechanism might be operating in the cochlea. The following speculation was the result.
Imagine that, in the quiet, the basilar membrane (or associated structures, or both) undergo weak vibrations that are attributable in part to the random, spontaneous firings of the MOC efferent fibers. Each MOC fiber contacts dozens of OHCs spread across the three linear rows, and each OHC receives innervation from several different fibers (Warr and Guinan, 1979; Brown, 2011). The density of this efferent innervation is greater in the middle and at the basal end of the cochlea than at the apical end (Liberman et al., 1990; Brown, 2011), but along much of the length of the basilar membrane, there is a complex, overlapping mesh of efferent innervation. We speculate that, during inattention, the random, spontaneous firings in the MOC fibers are somewhat positively correlated, meaning that individual OHCs have a relatively high probability of receiving nearly synchronous neurotransmitter releases from more than one input fiber. We know that efferent input to an OHC causes inhibition (e.g., Brown et al., 1983; Cooper and Guinan, 2006; Guinan, 2006, 2011; Guinan et al., 2003; Murugasu and Russell, 1996; Rabbitt et al., 2009; Wiederhold and Kiang, 1970), so moments of multiple, nearly synchronous activations alternating with moments of fewer or no activations would mean that there would be corresponding alternations in the level of inhibition of the affected OHCs. We speculate that those nearly synchronous activations, and the consequent nearly regular alternations in inhibition, lead to nearly regular alternations in the lengths of the OHCs (Brownell et al., 1985; Breneman et al., 2009) that in turn lead to weak vibrations, that then sum with other local random vibrations, propagate basally, and escape into the ear canal where they can be recorded by our microphone. This cochlear noise ordinarily would not be separable from other sources of noise in the ear canal, but its susceptibility to modulation by cortico-olivo influences both allow it to be detected and confirm its cochlear origin.
To complete this speculation, we presume that, under heavy attentional load, the increase in efferent activation that persists throughout the silent period coincides with a persisting decrease in synchronization of the firing patterns in the overlapping efferent fibers. This loss of synchronization produces smaller differences in the alternating magnitudes of inhibition, and the magnitude of the sound produced is reduced. Were a mechanism of this sort operating, the auditory system would be using a similar strategy to that apparently used by the visual system – desynchronization of firings – to solve a similar problem when attention is required either within or across modalities.
5.2. Controls
Our auditory-attention and inattention conditions were designed to be as similar as possible and to differ primarily in the amount of cognitive resources required to perform the behavioral task. The dichotic- and diotic-listening conditions had both selective-attention (Treisman, 1969) and working-memory components (Baddeley, 1992), but the inattention condition had neither of these components. Specifically, for the dichotic-listening condition: auditory attention was required to locate the female talker at either the left or right ear, and then to maintain focus on the female speech stream while concurrently ignoring the male speech stream. While the subject was attending to the female talker, working memory was required to retain the sequence of digits that she spoke until the response interval began. Similar components were required in the diotic-listening task. In contrast, in the inattention condition, the behavioral task only required that a subject listen for the final sound on every trial. Then, for the attention and inattention tasks alike, the actual behavioral response was the same: a button press with the right hand.
At the level of the auditory cortex, the sounds presented during the inattention and selective-listening conditions differed greatly in terms of their perceptual saliency, but for the cochlea, the sounds presented during every condition should have been processed similarly. The spectra were identical, the temporal envelopes were identical, and the overall level of the SSN stimuli was equal to the overall level of the speech waveforms (about 50 dB SPL). Furthermore, the SSN stimuli were lowpass filtered at 3.0 kHz, just like the actual speech waveforms, and the timing of the presentation of the waveforms was identical in the inattention and selective-listening conditions. In other words, the SSN stimuli had the characteristics of speech without actually sounding like speech, and they were presented exactly like the speech stimuli. As in the selective-listening conditions, the physiological responses during the inattention condition were evaluated for acceptance only if a key-press was recorded during the response interval. Again, the nSFOAE stimuli were identical across all conditions.
5.3. Comparison with previous reports
Common to many past studies of the effects of attention on OAEs was a concern that differential levels of physiological noise might exist across experimental conditions (Froehlich et al., 1990, 1993; Giard et al., 1994; Ferber-Viart et al., 1995; de Boer and Thornton, 2007, Harkrider and Bowers, 2009), and that these differences in background noise would have the potential to confound the interpretation of their OAE measures. Often, this concern was rooted in the fact that the attention task commonly required a motor response—typically a button-press—but the inattention task did not (Puel et al., 1988; Avan and Bonfils, 1992; Meric and Collet, 1992, 1994b; Froehlich et al., 1993; Ferber-Viart, 1995).
In those studies that did measure and report the noise levels from their OAE recordings (Froehlich et al. 1990, 1993; Ferber-Viart et al., 1995; de Boer and Thornton, 2007; Harkrider and Bowers, 2009), only de Boer and Thornton (2007) found a significant main effect of behavioral task (inattention versus attention) on the physiological-noise magnitudes in their OAE measures. In that study, click-evoked otoacoustic emissions (CEOAEs) were recorded and then saved to two buffers in an alternating fashion, thus yielding a pair of averaged CEOAE responses. For each pair, noise level was calculated by subtracting the two averaged CEOAE responses, and then converting the rms of the difference waveform to dB.
De Boer and Thornton (2007) observed that physiological-noise levels were significantly higher on average during an inattention task (pressing a button at the onset of each contralateral-noise presentation) than during a passive visual-attention task (watching a silent movie), and marginally significantly higher than during an active auditory-attention task (detecting tone pips in the click-train stimulus). There was no difference in noise level, however, between the inattention task and an active visual-attention task (judging the correctness of simple additive sums). For all comparisons, the magnitude of the difference between the inattention and attention conditions was less than 1.0 dB SPL. Notably, the one significant difference in noise level was for the comparison between the inattention task, which required a motor response (a button-press), and the passive visual-attention task, which did not; noise levels were higher when a motor response was required. In the de Boer and Thornton study, there also were significantly more rejected CEOAE measures (due to excessive noise) during the inattention task than during either the passive visual- or active auditory-attention tasks, but not during the active visual-attention task, which had a higher rejection rate than the inattention task. Overall, between the noise and rejection measures, there generally was more subject noise associated with the inattention task than with the various attention tasks (which was not true in the present study—see above).
In a similar study, Harkrider and Bowers (2009) also used otoacoustic emissions to study the effects of attention on peripheral auditory processing. Like de Boer and Thornton (2007), they measured CEOAEs, saved the responses to alternating buffers, and calculated a noise level for each condition by subtracting the two averaged responses. Unlike de Boer and Thornton, however, Harkrider and Bowers did not observe a significant main effect of condition on noise level. They did observe more rejected CEOAE measures in the inattention condition than in their auditory-attention conditions, but none of these differences were statistically significant.
In the present study, the same motor response—a button-press on a keypad— was required on every trial of every condition, thus eliminating the possibility that the substantial differences in noise levels across conditions were due to differences in the motor behavior of our subjects. It is possible that we observe larger effects of attention on cochlear noise levels than previous researchers because we used a bilateral MOC elicitor, which produces more activation than either an ipsilateral or contralateral noise presented in isolation (Lilaonitkul and Guinan, 2009), or because our behavioral tasks were more demanding cognitively than those used previously. The effects reported here actually could be underestimates of what would be obtained with a slightly stronger wideband noise, because MOC activation increases with increasing level of the eliciting sound over a range of moderate sound-pressure levels (Veuillet et al., 1991; Guinan et al., 2003; Backus and Guinan, 2006).
5.4. Final comment
In conclusion, we note that our results provide no obvious evidence for a mechanism that could help with differential attention to one ear or to different frequency regions within an ear; nor is there evidence for a mechanism that could switch attention rapidly between various auditory targets. For the stimuli employed here, at least, the mechanism we have documented operated equally on the attended and non-attended ears, operated globally across the spectrum, and showed persistence (was sluggish --- also see Walsh et al., 2010a, 2010b). Although individual readers may join us in finding some of our new results interesting, the reality is that we still do not know whether or how the auditory system profits from the additional efferent activity during selective attention, nor even whether there are degrees of activation of the attention mechanism. All we know is that MOC efferent activity does appear to be correlated with selective attention. Learning more will require more subtle manipulations than were reported here.
Highlights.
Physiological noise was weaker during selective auditory attention than inattention.
Noise levels were similar for the dichotic- and diotic-attention conditions.
Noise levels were similar whether the right or left ear was attended.
The effects of attention were evident across the frequency spectrum.
The effects of attention also were evident at subjects’ SOAE frequencies.
Acknowledgments
This study was done as part of the requirements for a doctoral degree from The University of Texas at Austin, by author KPW, who now is located at the University of Minnesota. The work was supported by a research grant awarded to author DM by the National Institute on Deafness and other Communication Disorders (NIDCD; RO1 DC000153). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDCD or the National Institutes of Health. D.O. Kim and J.G. Guinan provided helpful discussion about these results, and two anonymous reviewers provided thoughtful comments.
ABBREVIATIONS
- ANOVA
analysis of variance
- CEOAE
click evoked otoacoustic emission
- d
Cohen’s d, effect size
- DCN
dorsal cochlear nucleus
- FFT
fast Fourier transform
- HL
hearing level
- ISI
interstimulus interval
- MOC
medial olivocochlear
- MOCB
medial olivocochlear bundle
- nSFOAE
nonlinear stimulus-frequency otoacoustic emission
- OAE
otoacoustic emission
- OHC
outer hair cell
- SFOAE
stimulus-frequency otoacoustic emission
- SOAE
spontaneous otoacoustic emission
- SOC
superior olivary complex
- SSN
speech-shaped noise
- dB SPL
decibels sound-pressure level
- Hz
hertz
- kHz
kilohertz
- min
minute
- ms
millisecond
Appendix
6.1. nSFOAE Acceptance Criteria
The nSFOAE procedure began with a calibration routine, during which no speech stimuli were presented. The level of a 500-Hz tone was adjusted in the right ear canal of the subject to attain 65 dB SPL. This routine was run separately for each of the ER-2 earphones. A calibration factor was calculated and then was used to scale the amplitude of the experimental stimulus used to elicit the nSFOAE (the tone-plus-noise stimulus described above). This calibration routine was followed by two criterion-setting routines, and then the main data-acquisition routine. The first criterion-setting routine consisted of 12 trials (each having two triplets) during which all nSFOAE responses were accepted unless the peak amplitude of the recording exceeded 45 dB SPL. (Difference-waveforms whose amplitudes exceeded this limit typically were observed when the subject moved, swallowed, or produced some other artifactual noise.) All of the accepted nSFOAE responses were averaged point-by-point, and the resultant waveform served as the foundation for the accumulating nSFOAE average to be constructed during the main acquisition routine. Furthermore, the rms value of each accepted nSFOAE response was computed, and the resulting distribution of rms values was used to evaluate subsequent responses during the main acquisition routine. The second criterion-setting routine consisted of a 20-s recording in the quiet during which no sound was presented to the ears. The median rms voltage from this recording was calculated, and was used as a measure of the ambient (physiological) noise level for that individual subject.
During the main acquisition routine, each new nSFOAE response was compared to the data collected during the two criterion-setting routines, and was accepted into the accumulating nSFOAE average if either one of two criteria was satisfied. First, if the rms value of the new nSFOAE response was less than 0.25 standard deviations above the median rms of the saved distribution, then the nSFOAE was added to the accumulating average. Second, each new nSFOAE response was subtracted point-for-point from the accumulating nSFOAE average. The rms of this difference waveform was computed, then converted to decibels. If the magnitude of the difference waveform was less than 6.0 dB above the noise level measured earlier in the quiet, the new nSFOAE was accepted into the accumulating average. Subjects received feedback at the end of each trial about whether the physiological data met the criteria for acceptance. An additional trial was added to the block whenever the physiological data were not acceptable, which encouraged the subjects to remain as still and quiet as possible. The nSFOAE responses from triplet 1 always were averaged and analyzed separately from the responses from triplet 2, and the block of trials terminated when each of the two averages was composed of about 20 – 30 nSFOAE responses. Eventually the accepted responses were pooled across at least four blocks of trials (see text).
6.2. Comparison of dichotic- and diotic-attention conditions
Before a determination could be made about the comparability of the dichotic- and diotic-listening conditions, the two sets of data had to be made equivalent. As mentioned above, approximately half the number of individual trials contributed to the averaged physiological responses during the dichotic (and inattention) blocks of trials as during the diotic blocks—because the responses on ipsilateral trials initially were kept separate from those on contralateral trials. To make the data equivalent, the raw ipsilateral responses were summed, point for point, with the raw contralateral responses within each dichotic block of trials prior to doing window-by-window analyses like those already described. That is, for the dichotic blocks, the response waveforms themselves were summed prior to extracting the estimates of level within the successive 10-ms windows – just as for the diotic blocks. [The demonstration of the comparability of the ipsilateral and contralateral responses (Table 1) confirms that this summing did not distort the data.]
The results of this analysis are shown in Table A1. The individual entries are window-by-window averages across the ten 10-ms analysis windows beginning 10 ms into the silent period (the asymptotic values), then averaged across at least four blocks of trials. Although the dichotic and diotic data did exhibit differences for some subjects, the means at the bottom of Table A1 reveal that, on average, the asymptotic levels were essentially identical across the two listening conditions and the two triplets. Note that the effect sizes for the dichotic/diotic difference were small for both triplets. As a consequence of this comparison, for simplicity, the diotic data are omitted throughout the Results section; for all comparisons discussed, the diotic and dichotic data were essentially the same.
Table A1.
Triplet 1 | Triplet 2 | |||
---|---|---|---|---|
Subject | Dichotic | Diotic | Dichotic | Diotic |
L01 | −13.2 | −11.0 | −12.5 | −14.1 |
L02 | −12.4 | −12.4 | −12.8 | −12.4 |
L03 | −13.0 | −14.2 | −14.5 | −13.9 |
L04 | −15.6 | −12.5 | −14.2 | −13.6 |
L05 | −13.6 | −13.9 | −13.4 | −13.9 |
L06 | −14.1 | −14.5 | −12.7 | −13.6 |
L07 | −10.6 | −12.1 | −11.8 | −11.1 |
L08 | −14.4 | −15.7 | −14.4 | −12.0 |
Mean | −13.4 | −13.3 | −13.3 | −13.1 |
Std. Dev. | 1.5 | 1.5 | 1.0 | 1.1 |
Effect Size (Dichotic - Diotic) |
0.0 | −0.2 |
Within each block of trials, the raw ipsilateral and contralateral responses were averaged prior to window-by-window analysis.
Each entry is a window-by-window mean first across the final ten 10-ms windows of the silent period and then across at least four 30-trial blocks, corresponding to about 80 – 120 trials (correct only) averaged for each physiological response.
Footnotes
We regard the apparently asymptotic noise levels seen at the end of the silent period to be only temporarily asymptotic. If the differences in the physiological noise seen at the end of the silent period are in fact attributable to differences in the strength of the efferent effect under attention and inattention, then we should expect those differences to diminish as the silent period lengthens. That is, as the persistence of the efferent effect begins to decline (i.e., after a few hundred milliseconds of silence; see Backus and Guinan, 2006; Walsh et al., 2010b), then we should expect the physiological noise for the attention and inattention conditions to begin to converge, and with further increases in the silent period, that convergence should become complete. The final level should be somewhere above that for the current inattention condition.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Avan P, Bonfils P. Analysis of possible interactions of an attentional task with cochlear micromechanics. Hear. Res. 1992;57:269–275. doi: 10.1016/0378-5955(92)90156-h. [DOI] [PubMed] [Google Scholar]
- Backus BC, Guinan JJ., Jr. Time-course of the human medial olivocochlear reflex. J. Acoust. Soc. Am. 2006;119:2889–2904. doi: 10.1121/1.2169918. [DOI] [PubMed] [Google Scholar]
- Bacon SP, Fay RR, Popper AN, editors. Compression: from cochlea to cochlear implants. Springer; New York: 2004. [Google Scholar]
- Baddeley A. Working memory. Science. 1992;31:556–559. doi: 10.1126/science.1736359. [DOI] [PubMed] [Google Scholar]
- Breneman KD, Brownell WE, Rabbit RD. Hair cell bundles: flexoelectric motors of the inner ear. PloS One. 2009;4(4):e5201. doi: 10.1371/journal.pone.0005201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown MC. Anatomy of olivocochlear neurons. In: Ryugo DK, Fay RR, Popper AN, editors. Auditory and Vestibular Efferents. Springer; New York: 2011. pp. 17–37. [Google Scholar]
- Brown MC, Nuttall AL, Masta RI. Intracellular recordings from cochlear inner hair cells: effects of stimulation of the crossed olivocochlear efferents. Science. 1983;222:69–72. doi: 10.1126/science.6623058. [DOI] [PubMed] [Google Scholar]
- Brownell WE, Bader CR, Bertrand D, de Ribaupierre Y. Evoked mechanical responses of isolated cochlear outer hair cells. Science. 1985;227:194–196. doi: 10.1126/science.3966153. [DOI] [PubMed] [Google Scholar]
- Cherry C. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 1953;25:975–979. [Google Scholar]
- Cohen J. A power primer. Psych. Bull. 1992;112:155–159. doi: 10.1037//0033-2909.112.1.155. [DOI] [PubMed] [Google Scholar]
- Cooper NP, Guinan JJ., Jr. Separate mechanical processes underlie fast and slow effects of medial olivocochlear efferent activity. J. Physiol. 2003;548:307–312. doi: 10.1113/jphysiol.2003.039081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper NP, Guinan JJ., Jr. Efferent-mediated control of basilar membrane motion. J. Physiol. 2006;576:49–54. doi: 10.1113/jphysiol.2006.114991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corey DP, Hudspeth AJ. Kinetics of the receptor current in bullfrog saccular hair cells. J. Neurosci. 1983;3:962–976. doi: 10.1523/JNEUROSCI.03-05-00962.1983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davis H. An active process in cochlear mechanics. Hear. Res. 1983;9:79–90. doi: 10.1016/0378-5955(83)90136-3. [DOI] [PubMed] [Google Scholar]
- de Boer J, Thornton ARD. Effect of subject task on contralateral suppression of click evoked otoacoustic emissions. Hear. Res. 2007;233:117–123. doi: 10.1016/j.heares.2007.08.002. [DOI] [PubMed] [Google Scholar]
- de Vries HL. Brownian movement and hearing. Physica. 1948;14:48–60. [Google Scholar]
- Diemont B, Figini MM, Orizio C, Perin R, Veicsteinas A. Spectral analysis of muscular sound at low and high contraction level. Int. J. Bio-Med. Comp. 1988;23:161–175. doi: 10.1016/0020-7101(88)90011-6. [DOI] [PubMed] [Google Scholar]
- Ferber-Viart C, Duclaux R, Collet L, Guyonnard F. Influence of auditory stimulation and visual attention on otoacoustic emissions. Physiol. Behav. 1995;57:1075–1079. doi: 10.1016/0031-9384(95)00012-8. [DOI] [PubMed] [Google Scholar]
- Fritz JB, Elhilali M, David SV, Shamma SA. Auditory attention – focusing the searchlight on sound. Curr. Opin. Neurobiol. 2007;17:437–455. doi: 10.1016/j.conb.2007.07.011. [DOI] [PubMed] [Google Scholar]
- Froehlich P, Collet L, Chanal J-M, Morgon A. Variability of the influence of a visual task on the active micromechanical properties of the cochlea. Brain Res. 1990;508:286–288. doi: 10.1016/0006-8993(90)90408-4. [DOI] [PubMed] [Google Scholar]
- Froehlich P, Collet L, Morgon A. Transiently evoked otoacoustic emission amplitudes change with changes of directed attention. Physiol. Behav. 1993;53:679–682. doi: 10.1016/0031-9384(93)90173-d. [DOI] [PubMed] [Google Scholar]
- Gallun FJ, Mason CR, Kidd G., Jr. Task-dependent costs in processing two simultaneous auditory stimuli. Perception and Psychophysics. 2007;69:757–771. doi: 10.3758/bf03193777. [DOI] [PubMed] [Google Scholar]
- Gavriely N, Palti Y, Alroy G. Spectral characteristics of normal breath sounds. J. Appl. Physiol. 1981;50:307–314. doi: 10.1152/jappl.1981.50.2.307. [DOI] [PubMed] [Google Scholar]
- Giard M-H, Collet L, Bouchet P, Pernier J. Auditory selective attention in the human cochlea. Brain Res. 1994;633:353–356. doi: 10.1016/0006-8993(94)91561-x. [DOI] [PubMed] [Google Scholar]
- Goodman SS, Keefe DH. Simultaneous measurement of noise-activated middle-ear muscle reflex and stimulus frequency otoacoustic emissions. J. Assoc. Res. Otolaryngol. 2006;7:125–139. doi: 10.1007/s10162-006-0028-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guinan JJ., Jr. Cochlear efferent innervation and function. Curr. Opin. Otolaryngol. Head Neck Surg. 2010;18:447–453. doi: 10.1097/MOO.0b013e32833e05d6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guinan JJ., Jr. Olivocochlear efferents: anatomy, physiology, function, and the measurement of efferent effects in humans. Ear Hear. 2006;27:589–607. doi: 10.1097/01.aud.0000240507.83072.e7. [DOI] [PubMed] [Google Scholar]
- Guinan JJ., Jr. Physiology of the medial and lateral olivocochlear systems. In: Ryugo DK, Fay RR, Popper AN, editors. Auditory and Vestibular Efferents. Springer; New York: 2011. pp. 39–81. [Google Scholar]
- Guinan JJ, Jr., Backus BC, Lilaonitkul W, Aharonson V. Medial olivocochlear efferent reflex in humans: otoacoustic emission (OAE) measurement issues and the advantages of stimulus frequency OAEs. J. Assoc. Res. Otolaryngol. 2003;4:521–540. doi: 10.1007/s10162-002-3037-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hafter ER, Bonnel A-M, Gallun E, Cohen E. A role for memory in divided attention between two independent stimuli. In: Palmer AR, Rees A, Summerfield AQ, Meddis R, editors. Psychophysical and physiological advances in hearing. Whurr; London: 1998. pp. 228–237. [Google Scholar]
- Harkrider AW, Bowers CD. Evidence for a cortically mediated release from inhibition in the human cochlea. J. Am. Acad. Audiol. 2009;20:208–215. doi: 10.3766/jaaa.20.3.7. [DOI] [PubMed] [Google Scholar]
- Harris GG. Brownian motion in the cochlear partition. J. Acoust. Soc. Am. 1968;44:176–186. doi: 10.1121/1.1911052. [DOI] [PubMed] [Google Scholar]
- Hernández-Peón R, Scherrer H, Jouvet M. Modification of electric activity in cochlear nucleus during “attention” in unanesthetized cats. Science. 1956;123:331–332. doi: 10.1126/science.123.3191.331. [DOI] [PubMed] [Google Scholar]
- Jaramillo F, Wiesenfeld K. Mechanoelectrical transduction assisted by Brownian motion: a role for noise in the auditory system. Nature Neurosci. 1998;1(5):384–388. doi: 10.1038/1597. [DOI] [PubMed] [Google Scholar]
- Keefe DH. Double-evoked otoacoustic emissions. I. Measurement theory and nonlinear coherence. J. Acoust. Soc. Am. 1998;103:3489–3498. doi: 10.1121/1.423058. [DOI] [PubMed] [Google Scholar]
- Kemp DT. Stimulated acoustic emissions from within the human auditory System. J. Acoust. Soc. Am. 1978;64:1386–1391. doi: 10.1121/1.382104. [DOI] [PubMed] [Google Scholar]
- Kemp DT. Towards a model for the origin of cochlear echoes. Hear. Res. 1980;2:533–548. doi: 10.1016/0378-5955(80)90091-x. [DOI] [PubMed] [Google Scholar]
- Liberman MC, Dodds LW, Pierce S. Afferent and efferent innervation of the cat cochlea: quantitative analysis with light and electron microscopy. J. Comp. Neurol. 1990;301:443–460. doi: 10.1002/cne.903010309. [DOI] [PubMed] [Google Scholar]
- Lilaonitkul W, Guinan JJ., Jr. Human medial olivocochlear reflex: effects as functions of contralateral, ipsilateral, and bilateral elicitor bandwidths. J. Assoc. Res. Otolaryngol. 2009;10:459–470. doi: 10.1007/s10162-009-0163-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lukas JH. Human auditory attention: The olivocochlear bundle may function as a peripheral filter. Psychophysiol. 1980;17:444–452. doi: 10.1111/j.1469-8986.1980.tb00181.x. [DOI] [PubMed] [Google Scholar]
- Maison S, Micheyl C, Collet L. Influence of focused auditory attention on cochlear activity in humans. Psychophys. 2001;38:35–40. [PubMed] [Google Scholar]
- Meric C, Collet L. Visual attention and evoked otoacoustic emissions: a slight but real effect. Int. J. Psychophysiol. 1992;12:233–235. doi: 10.1016/0167-8760(92)90061-f. [DOI] [PubMed] [Google Scholar]
- Meric C, Collet L. Attention and otoacoustic emissions: a review. Neurosci. Biobehav. Rev. 1994a;18:215–222. doi: 10.1016/0149-7634(94)90026-4. [DOI] [PubMed] [Google Scholar]
- Meric C, Collet L. Differential effects of visual attention on spontaneous and evoked otoacoustic emissions. Int. J. Psychophysiol. 1994b;17:281–289. doi: 10.1016/0167-8760(94)90070-1. [DOI] [PubMed] [Google Scholar]
- Mitchell JF, Sundberg KA, Reynolds JH. Spatial attention decorrelates intrinsic activity fluctuations in macaque area V4. Neuron. 2009;63:879–888. doi: 10.1016/j.neuron.2009.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mulders WHAM, Robertson D. Effects on cochlear responses of activation of descending pathways from the inferior colliculus. Hear. Res. 2000a;149:11–23. doi: 10.1016/s0378-5955(00)00157-x. [DOI] [PubMed] [Google Scholar]
- Mulders WHAM, Robertson D. Evidence for direct cortical innervation of medial olivocochlear neurons in rats. Hear. Res. 2000b;144:65–72. doi: 10.1016/s0378-5955(00)00046-0. [DOI] [PubMed] [Google Scholar]
- Murugasu E, Russell IJ. The effect of efferent stimulation on basilar membrane displacement in the basal turn of the guinea pig cochlea. J. Neurosci. 1996;16:325–332. doi: 10.1523/JNEUROSCI.16-01-00325.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nutall AL, Guo M, Ren T, Dolan DF. Basilar membrane velocity noise. Hear. Res. 1997;114:35–42. doi: 10.1016/s0378-5955(97)00147-0. [DOI] [PubMed] [Google Scholar]
- Oster G, Jaffe JS. Low frequency sounds from sustained contraction of human skeletal muscle. Biophys. J. 1980;30:119–127. doi: 10.1016/S0006-3495(80)85080-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pasanen EG, McFadden D. An automated procedure for identifying spontaneous otoacoustic emissions. J. Acoust. Soc. Am. 2000;108:1105–1116. doi: 10.1121/1.1287026. [DOI] [PubMed] [Google Scholar]
- Picton TW, Hillyard SA. Human auditory evoked potentials. II. Effects of attention. Electroenceph. and Clinical Neurophys. 1971;36:191–199. doi: 10.1016/0013-4694(74)90156-4. [DOI] [PubMed] [Google Scholar]
- Puel J-L, Bonfils P, Pujol R. Selective attention modifies the active micromechanical properties of the cochlea. Brain Res. 1988;447:380–383. doi: 10.1016/0006-8993(88)91144-4. [DOI] [PubMed] [Google Scholar]
- Puria S. Measurements of human middle ear forward and reverse acoustics: Implications for otoacoustic emissions. J. Acoust. Soc. Am. 2003;113:2773–2789. doi: 10.1121/1.1564018. [DOI] [PubMed] [Google Scholar]
- Rabbitt RD, Clifford S, Breneman KD, Farrell B, Brownell WE. Power efficiency of outer hair cell somatic electromotility. PLoS Comput. Biol. 2009;5(7):e1000444. doi: 10.1371/journal.pcbi.1000444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rasmussen GL. The olivary peduncle and other fiber projections of the superior olivary complex. J. Comp. Neurol. 1946;84:141–219. doi: 10.1002/cne.900840204. [DOI] [PubMed] [Google Scholar]
- Rasmussen GL. Further observations of the efferent cochlear bundle. J. Comp. Neurol. 1953;99:61–74. doi: 10.1002/cne.900990105. [DOI] [PubMed] [Google Scholar]
- Ren T, Zhang M, Nuttall AL, Miller JM. Heart beat modulation of spontaneous otoacoustic emissions in guinea pig. Acta Otolaryngol. (Stockh) 1995;115:725–731. doi: 10.3109/00016489509139393. [DOI] [PubMed] [Google Scholar]
- Shera CA. Mammalian spontaneous otoacoustic emissions are amplitude-stabilized cochlear standing waves. J. Acoust. Soc. Am. 2003;114:244–262. doi: 10.1121/1.1575750. [DOI] [PubMed] [Google Scholar]
- Sridhar TS, Liberman MC, Brown MC, Sewell WF. A novel cholinergic “slow effect” of efferent stimulation on cochlear potentials in the guinea pig. J. Neurosci. 1995;15:3667–3678. doi: 10.1523/JNEUROSCI.15-05-03667.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Svrcek-Seiler WA, Gebeshuber IC, Rattay F, Biro TS, Markum H. Micromechanical models for the Brownian motion of hair cell stereocilia. J. Theor. Biol. 1998;193:623–630. doi: 10.1006/jtbi.1998.0729. [DOI] [PubMed] [Google Scholar]
- Treisman AM. Strategies and models of selective attention. Psych. Rev. 1969;76:282–299. doi: 10.1037/h0027242. [DOI] [PubMed] [Google Scholar]
- Veuillet E, Collet L, Duclaux R. Effect of contralateral acoustic stimulation on active cochlear micromechanical properties in human subjects: dependence on stimulus variables. J. Neurophysiol. 1991;65:724–735. doi: 10.1152/jn.1991.65.3.724. [DOI] [PubMed] [Google Scholar]
- Walsh KP, Pasanen EG, McFadden D. Properties of a nonlinear version of the stimulus-frequency otoacoustic emission. J. Acoust. Soc. Am. 2010a;127:955–969. doi: 10.1121/1.3279832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walsh KP, Pasanen EG, McFadden D. Overshoot measured physiologically and psychophysically in the same human ears. Hear. Res. 2010b;268:22–37. doi: 10.1016/j.heares.2010.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walsh KP, Pasanen EG, McFadden D. Selective attention reduces physiological noise in the external ear canals of humans. II: Visual attention. Submitted to Hear. Res. 2014 Jul; doi: 10.1016/j.heares.2014.03.013. 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Warr WB, Guinan JJ., Jr. Efferent innervation of the organ of Corti: two separate systems. Brain Res. 1979;173(1):152–155. doi: 10.1016/0006-8993(79)91104-1. [DOI] [PubMed] [Google Scholar]
- Wiederhold ML, Kiang NYS. Effects of electric stimulation of the crossed olivocochlear bundle on single auditory-nerve fibers in the cat. J. Acoust. Soc. Am. 1970;48:950–965. doi: 10.1121/1.1912234. [DOI] [PubMed] [Google Scholar]
- Wilson JP. Evidence for a cochlear origin for acoustic reemissions, threshold fine-structure and tonal tinnitus. Hear. Res. 1980;2:233–252. doi: 10.1016/0378-5955(80)90060-x. [DOI] [PubMed] [Google Scholar]
- Wit HP, Ritsma RJ. Stimulated acoustic emissions from the human ear. J. Acoust. Soc. Am. 1979;66:911–913. [Google Scholar]
- Worden FG. Auditory habituation. In: Peeke H, Herz M, editors. Habituation. Academic Press; New York: 1973. pp. 109–137. [Google Scholar]