Abstract
A procedure for extracting the nonlinear component of the stimulus-frequency otoacoustic emission (SFOAE) is described. This nSFOAE measures the amount by which the cochlear response deviates from linear additivity when the input stimulus is doubled in amplitude. When a 4.0‐kHz tone was presented alone, the magnitude of the nSFOAE response remained essentially constant throughout the 400‐ms duration of the tone; response magnitude did increase monotonically with increasing tone level. When a wideband noise was presented alone, nSFOAE magnitude increased over the initial 100‐ to 200‐ms portion of the 400‐ms duration of the noise. When the tone and the wideband noise were presented simultaneously, nSFOAE magnitude decreased momentarily, then increased substantially for about the first 100 ms and then remained strong for the remainder of the presentation. Manipulations of the noise bandwidth revealed that the low-frequency components were primarily responsible for this rising, dynamic response; no rising segment was seen with bandpass or highpass noise. The rising, dynamic nSFOAE response is likely attributable to activation of the medial olivocochlear efferent system. This perstimulatory emission appears to have the potential to provide information about the earliest stages of auditory processing for stimuli commonly used in psychoacoustical tasks.
INTRODUCTION
One type of otoacoustic emission (OAE) is the stimulus-frequency OAE (SFOAE). Kemp was the first to measure SFOAEs (Kemp and Chum, 1980; Kemp, 1980) and since then various procedures have been used to extract versions of the SFOAE. Fundamental characteristics of SFOAEs are that they are measured during the presentation of a (much-stronger) acoustic stimulus, typically a tone, and the frequency components present in the SFOAE correspond to frequency components in the acoustic stimulus. Here we describe a procedure that extracts the nonlinear component of the SFOAE response and that appears to have promise for studying cochlear processes during the presentation of waveforms commonly used in behavioral studies of hearing.
The auditory system consists of a series of early mechanical stages of processing of the incoming sound at the auditory periphery followed by numerous neural stages of processing in the brain, all culminating in auditory perception and the initiation of responses to the acoustical world. Each successive stage of processing has the potential to perpetuate, elaborate, offset, and∕or augment aspects of the earlier stages of processing. An ultimate goal of auditory science is to understand how and where in this series of processing stages various details of auditory perception arise and are refined. For example, one might ask where and how the critical band first arises, and where and how it is refined to have the final characteristics it has in detailed behavioral tests.
To date, our greatest knowledge about auditory performance and perception has come from psychophysical studies of humans, and our greatest knowledge about the processing characteristics of the auditory periphery and brain has come from physiological studies on non-human species. This partitioning of knowledge by species is likely to persist for some time because of the inherent difficulties associated with obtaining appropriate measurements of the successive neural stages in humans. However, the physiological measure described here appears able to provide auditory science with knowledge about some early stages of processing in humans during the presentation of a wide array of stimulus waveforms, and offers a noninvasive technique for animal research as well.
The procedures described here are closely related to those used by Keefe (e.g., Keefe, 1998; Schairer et al., 2003; Schairer and Keefe, 2005; Keefe et al., 2009) and Guinan (e.g., Guinan, 2006; Guinan et al., 2003; Backus and Guinan, 2006) to measure SFOAEs, but the stimuli, procedures, and analyses do differ in several significant details. More importantly, several of our key outcomes differ from those reported previously. Procedures used by previous investigators have measured a combination of the linear and nonlinear components of the SFOAE, whereas our procedure extracts only the nonlinear component. Ultimately, this difference may explain some of the differences in outcomes that have been observed with the various procedures. We use the term nSFOAE when referring to our measure, both to distinguish it from other SFOAE measures and to emphasize that it contains only the nonlinear component of the physiological response.
Our long-term goal is to compare the measurements obtained using certain forms of OAE with behavioral performance measured in various psychophysical tasks when using the same ears, the same acoustic stimuli, and the same basic procedures in both domains. Specifically, the purpose of this research is to determine how much the processing characteristics of the active human cochlea contribute to performance in certain psychoacoustical tasks, and how much of the individual differences that exist psychoacoustically can be explained by individual differences in cochlear response. However, prior to reporting on specific psychoacoustical tasks, such as overshoot, forward masking, critical bands, etc., some of the basic properties of the nSFOAE need to be documented, and that is the purpose of this report. Additional nSFOAE measurements made as part of this study are reported in the second paper in this series.
METHODS
Subjects
One female aged 21, two males aged 19, and one male aged 26 participated in this study. All had audiometrically normal hearing sensitivity (≤15 dB Hearing Level) in both ears for the standard audiometric frequencies between 250 and 8000 Hz and normal middle-ear function as measured by a clinical audiometric screening device (Auto Tymp 38, GSI∕VIASYS, Madison, WI). No subject had an SOAE stronger than −9.0 dB sound-pressure level (SPL) any closer than 640 Hz to the 4.0‐kHz tone used for most measurements here. The male subjects all had extensive experience with auditory psychophysics, and all four subjects had their OAEs measured at least once prior to this study. Informed consent was obtained from all subjects prior to the study, and all were paid for their participation except for author KW.
Procedures
The OAE measure used here is related to the SFOAE that was first measured by Kemp and Chum (1980) and later studied by Kemp (1980), Zwicker and Schloth (1984), Dallmayr (1987), and Lonsbury-Martin et al. (1990), among others (see Probst et al., 1991, for an early review). In recent times, SFOAEs have been used productively by a number of prominent investigators studying an assortment of basic phenomena in various species. Numerous procedures for extracting the SFOAE have been reported, but the common factor is that the measurements are made during the presentation of an acoustic stimulus, typically a stimulus of long duration. As noted, this perstimulatory response contains both linear and nonlinear components, and most procedures measure a combination of the two, whereas the procedure to be described here extracts only the nonlinear component, hence the name nSFOAE.
The primary goal of the work reported here was to explore how the nSFOAE measure typically responds to manipulations of a number of basic stimulus parameters. Complete sets of data were collected from all subjects for the majority of conditions, but for some conditions, data were collected from only one or two subjects, and those conditions are identified where appropriate. The data are described largely in terms of general patterns of the nSFOAE response, and representative results are shown in the figures. Comparisons are made between subjects’ data, but the possible implications of individual differences on psychophysical performance are not explored here.
For the OAE measurements, an individual subject was seated in a comfortable reclining chair in a double-walled, sound-attenuated room. Prior to any data collection, the subject relaxed in isolation in this room for 15 min. This initialization period has been shown to stabilize OAE measurements (Whitehead, 1991; McFadden and Pasanen, 1994). The OAEs were measured in the right ear in all subjects.
The equipment used for measuring nSFOAEs included an Etymotic ER-10A microphone system and two Etymotic ER-2 earphones (Etymotic, Elk Grove Village, IL). The earphones were attached to small plastic sound-delivery tubes that terminated at the sound-delivery ports located at the outboard end of the microphone capsule. Metal tubes passed through the probe tip of the microphone system for the delivery of sound into the external ear canal. The acoustic stimuli were produced, and the nSFOAEs recorded, using a Macintosh G4 computer running custom-written LabVIEW® software.
The microphone output was amplified by 20 dB by the Etymotic preamplifier, then passed to a custom-built amplifier∕filter that highpass filtered the waveform at about 400 Hz to eliminate low-frequency noise. The waveform also was lowpass filtered at about 15 kHz prior to being delivered to an analog-to-digital converter. All acoustic stimuli were generated digitally in the computer, delivered to a digital-to-analog converter, passed through a custom-built earphone amplifier, and then delivered to the earphones. The sampling rate for both input and output was 50 kHz with 16‐bit resolution. Digitizing was accomplished using a National Instruments board (PCI-MIO-16XE-10) installed in the Macintosh G4 computer.
Our procedure for extracting the nSFOAE borrows elements from several previous reports (e.g., Keefe, 1998; Schairer et al., 2003; Guinan et al., 2003). For data collection, the stimulus waveforms were organized in sets of three successive presentations (a triplet), and a block consisted of at least 50 triplets. The first stimulus presentation of each triplet involved activating only one of the two Etymotic ER-2 earphones, the second presentation involved activating only the other earphone, and the third presentation involved the simultaneous activation of both earphones playing the same digitized waveform (a version of the “double-evoked” procedure described by Keefe, 1998). For all triplets in a block, the stimulus waveforms were identical. The voltage delivered to the individual earphones was always exactly the same for all triplets for a particular condition; so in a strictly linear system, the instantaneous pressure fluctuations for the third presentation should have been the exact acoustic sum of the instantaneous fluctuations in the first two presentations.
The sound in the ear canal was recorded during all three presentations for each triplet. Those sounds consisted of the (quite strong) acoustic stimulus plus whatever (weak) sound was being produced inside the cochlea in response to the acoustic stimulus. The sounds recorded during each of the first two presentations of a triplet were summed, and that sum was subtracted from the sound recorded during the third presentation of that triplet. (This subtraction was performed prior to the presentation of the next triplet in order to assess that sample’s freedom from artifact and noise; see below.) The acoustic stimuli always were the same digitized waveforms, so ideally the result of this subtraction was the elimination of both the physical stimulus and any linear components of the cochlear response. Assuming that the recording system itself is linear, any residual left in the difference waveform can be attributed to nonlinearity in the cochlear response (see Keefe, 1998)—here called the nSFOAE. The nSFOAE response might best be thought of as the difference between the actual measurement in the ear canal and the expectation about that measurement based on the process of additivity in a strictly linear system.
Those difference waveforms determined to be artifact-and noise-free (see below) were averaged, and data collection continued until at least 50 such waveforms were collected. The resulting averaged waveform was saved and later subjected to an analysis (see below) that yielded our estimate of the nSFOAE to the acoustic stimuli used for that block of triplets. The assumption that the difference waveform originates primarily from the cochlea is strengthened by the fact that the earphones and microphone used here were highly linear in their responses when our system was tested with a (strictly linear) 0.5‐cc syringe instead of an ear canal. In the syringe, the nonlinear response was about −15 to −20 dB SPL in the 400‐Hz band used for analysis (see below), which was well below the values typically measured in an ear canal, and was not distinguishable from the noise floor of the measurement system with stimulus amplitudes set to zero.
The sound-pressure levels produced by the ER-2 earphones when driven by the waveforms produced by the computer were calibrated using a 0.5‐in. pressure microphone (B&K 4134) and sound-level meter (B&K 2215) mounted to a Zwislocki coupler (DB4005). The AC output of the sound-level meter was delivered to a spectrum analyzer (Hewlett-Packard model 35665). In all cases the ER-2 earphones were connected to the ER-10A OAE microphone in the same configuration used for the OAE measurements. An ER10-14 foam ear tip was placed on the OAE microphone, and the ear tip was completely inserted into the ear-canal extension of the coupler. Swept-sine measurements of the frequency response of the ER-2 itself in this configuration revealed a uniform response within ±1 dB between 0.5 and 6.5 kHz. The net frequency response of the entire sound-delivery system was evaluated using computer-generated white noise. Irregularities in the frequency response were corrected by adjusting the magnitude spectrum in the noise-generation program. When the calibration procedure was checked by placing the microphone assembly in the coupler as described above, the stimulus levels were verified to be within 1 dB of the intended values.2
Immediately prior to each data-collection block, with the microphone assembly fitted in the ear canal, the level of a 500‐Hz calibration tone was adjusted to produce 65 dB SPL, and the resulting calibration factor was used to set presentation levels for the tone and noise waveforms to be used during that block of triplets. Next, a series of about 16 triplets was presented over the course of about 50 s, the rms value of each nSFOAE was calculated and saved, and that distribution of values was used later when deciding about which individual responses to include in the accumulating average nSFOAE and which to exclude as being possibly contaminated by physiological noise. Then a recording was obtained during a 20‐s period of silence to provide an estimate of the ambient noise level in the ear canal in the absence of our acoustic stimulus. Actual data collection followed these preliminaries.
During the data-collection process, the difference waveforms obtained from individual triplets were evaluated in multiple ways prior to being accepted for use in the accumulating average. For the first 20 of the 50 individual nSFOAE responses making up an averaged nSFOAE, the difference waveform from each triplet had to satisfy the noise and artifact criteria in two ways. First, the difference waveform was compared to the distribution of 16 responses described in the preceding paragraph and was included in the accumulating nSFOAE average only if its rms level was less than 0.25 standard deviation (SD) above the median of that distribution. Second, the difference waveform from each triplet was subtracted point-for-point from the accumulating nSFOAE average, the rms of that difference was calculated, and only if the magnitude of that rms value was less than 6 dB above the ambient noise level in the quiet (as measured earlier) was that individual nSFOAE response added to the accumulating average. For the final 30 of the individual nSFOAE responses, the difference waveform from a triplet was added to the accumulating average only if it satisfied the second of these criteria. Typically about 60–65 triplets needed to be presented to acquire 50 usable difference waveforms from our highly experienced subjects.
Three basic stimulus configurations were used for testing: (1) a tone presented alone, (2) a sample of noise presented alone, and (3) a tone presented along with a sample of noise. In the majority of conditions, the tone was 4.0 kHz and 500 ms in duration, but for some conditions, the duration was reduced to 10 ms. The tone always was gated using a 5‐ms cosine-squared rise and decay, and the noise was gated using a 2‐ms rise and decay. For the majority of the measurements reported, the onset of the noise lagged the onset of the tone by 100 ms. Typically, the noise was band-limited between 0.1 and 6.0 kHz and had an overall level of about 63 dB SPL (a spectrum level of about 25 dB), but for some conditions, only lowpass, bandpass, or highpass versions of the noise were used. Lowpass and highpass noise bands had filter cutoffs 200 Hz below or above the tone frequency, respectively, and the bandpass noise was the complement of this arrangement—a 400‐Hz noise band centered on the tone frequency. The noise was digitally synthesized using an inverse fast-Fourier-transform (FFT) procedure, which provided infinitely steep spectral cutoffs prior to transduction by the ER-2 earphones. The duration of the noise was varied across conditions, but typically was 400 ms.
The tone and noise always were the same synthesized waveforms for all triplets of a condition, and their starting phases always were the same across presentations, conditions, and subjects. Stimulus presentations within and between triplets were separated by silent intervals of approximately 500‐ms duration except that after every sixth presentation (two triplets), the silent interval was about 2000 ms to allow for the necessary calculations regarding inclusion. To minimize any periodicities from the stimulus train, the actual separations between successive presentations varied randomly between 490 and 510 ms in 1‐ms steps.
The averaged nSFOAE waveforms were analyzed offline by taking successive 20‐ms time segments and measuring their rms levels at the output of a 400‐Hz-wide elliptic bandpass digital filter (sixth order) centered on the frequency of the 4.0‐kHz tone. This filter bandwidth corresponds approximately to one critical band at 4.0 kHz. The time window was rectangular and typically it was moved in 1‐ms steps beginning at stimulus onset. This succession of rms values was converted to decibels sound-pressure level and used as the nSFOAE response over time. When the duration of the tone was 10 ms, a 10‐ms window was used to analyze the nSFOAE response in order to improve temporal resolution. (The 20‐ms window typically used here was a compromise between the amount of smoothing and the precision of temporal resolution.) For purposes of presentation, only every fifth value from the moving analysis window was plotted in the figures.
Various aspects of the procedure contributed to stable results. The subjects reclined in a comfortable armchair, and the head was held in position using a pillow. The subjects all were highly experienced and had learned to remain quite still during testing. The levels of the tone and∕or noise were measured in the ear canal and adjusted as necessary prior to every block of triplets. The difference waveform from each triplet was compared with the accumulated average of difference waveforms for that block before being added to that average (see above), meaning that changes in the fit of the probe tip in the ear canal easily could be detected, at which point the block was restarted.
The Institutional Review Board at The University of Texas at Austin approved this research protocol.
RESULTS
Tone alone
Tones of 4.0 kHz and 400 ms in duration were presented at multiple sound-pressure levels to each of the subjects. Representative results are shown in Fig. 1. As can be seen, the strength of the nSFOAE remained essentially constant throughout the duration of the tone, and the nSFOAE increased monotonically in magnitude as the level of the tone was increased; that is, the magnitude of the nonlinear component of the SFOAE increased. Similar results were obtained for the other subjects, although two subjects had low-frequency fluctuations in the nSFOAE response magnitude when a tone of about 70 or 75 dB was presented, likely reflecting activation of the middle-ear reflex (MER).
Noise alone
Typically, the same sample of synthesized wideband noise (0.1–6.0 kHz) was used across subjects and for all conditions. When this sample of noise was presented alone, the nSFOAE response did increase monotonically with the level of the noise, but the response was more irregular than the nSFOAE to a tone presented alone (see Fig. 2). Much of this irregularity apparently is attributable to the nSFOAE “following” the fluctuations in amplitude that are inherent in the envelope of the noise waveform. This interpretation is supported by the similarity in the shapes of the nSFOAEs obtained for different subjects and at different noise levels (also, see Fig. 3 below).
As evidence that the wide fluctuations in the nSFOAE response to noise-alone in Fig. 2 are attributable to the envelope characteristics of the specific noise sample used to collect the data, nSFOAEs were collected for the noise-alone condition using two additional samples of wideband noise. The results are shown in Fig. 3, where it can be seen that the three different noise waveforms produced responses having unique irregularities but generally similar overall patterns.
Part of the overall pattern of response to noise-alone in Fig. 3 is that the nSFOAE can exhibit a slowly rising, dynamic segment over at least the first 100–200 ms of presentation. This effect occurs only for a relatively narrow range of noise spectrum levels. Careful examination of Fig. 2 reveals a rising, dynamic segment to the response for spectrum levels of 20 and 25 dB, but not for noises stronger or weaker than that, and this was true for other ears as well. For the weakest noise level used (15 dB spectrum level), a horizontal line appeared to fit the nSFOAE responses over the full 400‐ms duration of the noise-alone presentation for all four subjects, just like the tone-alone response.
Figure 4 summarizes the effect of stimulus level on the magnitude of the nSFOAE response for both the tone-alone and the noise-alone conditions (Figs. 12). Similar data were obtained for all four subjects, and regression lines were fitted to the individual data. Across subjects, the average slopes were 1.01 (SD=0.24) and 1.13 (SD=0.18) for tone-alone and noise-alone conditions, respectively, and for each of the subjects, the pairs of slopes were quite similar. For comparison, Shairer et al. (2006, p. 910) reported increases of about 0.75 dB per decibel of tone level using their SFOAE procedure with a 4‐kHz tone.
To obtain the values used for the fits plotted in Fig. 4, the magnitudes of the nSFOAE responses were averaged across successive 20‐ms analysis windows over the range of 150–350 ms of the 400‐ms common duration of the sounds. This range was selected in order to emphasize the steady-state segments of the responses. Note in Fig. 4 that when the overall noise level was about 73 dB (35 dB spectrum level), the magnitude of the nSFOAE response did not increase the way it had at weaker noise levels; accordingly, that value was not included in the fitting of the straight line to the noise-alone data.
Tone plus noise
When the tone and the wideband noise were presented together, the result was distinctly different from the response to tone-alone even though the noise was weak compared to the tone. After a decline in nSFOAE magnitude that began immediately upon noise onset and lasted about 25 ms, the nSFOAE to the tone-plus-noise increased substantially in magnitude and then asymptoted at a level about 4–15 dB above where it began, depending upon the individual subject. Typical results for two subjects are shown in Fig. 5 for the condition in which the wideband noise (0.1–6.0 kHz) had a spectrum level of 25 dB and the 4.0‐kHz tone was 60 dB; the tone was 500 ms in duration and the 400‐ms noise was gated on 100 ms after the onset of the tone. For the subject in the top panel of Fig. 5, the immediate decline in the nSFOAE response following noise onset was sharper than for the subject in the bottom panel. (Evidence presented below suggests that this immediate decline upon noise onset is attributable to lateral, or two-tone, suppression.) The difference between the nSFOAE response to tone-alone and the response at asymptote to tone-plus-noise was about 7 dB for subject JZ (top panel) and about 8 dB for subject NH (bottom panel). Note that in this condition, the level of the tone in the 400‐Hz analysis band centered at 4.0 kHz averaged about 9 dB higher than the overall level of the noise in that band, suggesting that in the tone-plus-noise condition, the nSFOAE response consisted primarily of the cochlear response to the tone. Also, the nSFOAE response to tone-plus-noise was not a simple combination of the responses to tone-alone and noise-alone (below we suggest that activation of the efferent system in the tone-plus-noise condition may explain this departure from simple additivity).
For each subject, the gradually rising nSFOAE response to tone-plus-noise was fitted with an exponential function of the form y=a(1−ebx)+c, beginning 25 ms after noise onset. When the exponential function was fitted to the averaged nSFOAE waveforms obtained using our standard 20‐ms time window, the fits generally were very good, and the resulting time constants were 30.3, 23.5, 38.5, and 23.0 ms for subjects KW, JZ, SC, and NH, respectively. However, these values are strongly dependent upon the length of the time window used for the analysis. When the exponential function was fitted to the rising, dynamic response obtained from the same averaged waveforms using a 40‐ms analysis window, the estimated time constants were in the range of about 70–90 ms.
Spectral characteristics of the noise
Additional measurements revealed that the magnitude and time course of the dynamic response illustrated in Fig. 5 are highly dependent upon the spectral characteristics of the noise band used. Specifically, when the noise was low-passed below the frequency of the 4.0‐kHz tone (i.e., from 0.1 to 3.8 kHz), the nSFOAE response was similar to the tone-plus-noise response in Fig. 5, but was generally somewhat reduced in maximum magnitude. However, when the noise was either high-passed above the frequency of the tone (i.e., from 4.2 to 6.0 kHz) or band-passed around the frequency of the tone (i.e., from 3.8 to 4.2 kHz), the nSFOAE response did not show the characteristic dynamic rise seen in the wideband condition. That is, the mechanism operating to produce the rising, dynamic response at each specific frequency region apparently cannot be activated by acoustic energy just anywhere in the spectrum; rather, there needs to be acoustic energy in the frequency region below (apical to) the specific region of interest. These spectral effects are illustrated in Fig. 6 for two subjects. For all of these noise bandwidths, the spectrum level of the noise was held constant at about 25 dB; the corresponding overall levels for the wideband, lowpass, bandpass, and highpass noises were approximately 63, 61, 51, and 58 dB SPL, respectively. When overall level was equated across the different noise bands (at 63 dB SPL, same as the wideband noise), the same pattern of results was observed.
Not only did the highpass noise not produce a rising, dynamic response, but at noise onset, the nSFOAE for tone-plus-highpass-noise typically fell rapidly to a value lying somewhere below the responses for both tone-alone and tone-plus-bandpass-noise and remained there for the duration of the stimuli. Similar effects were observed in other subjects with this combination of levels for tone and highpass noise. When the level of the tone was increased by 6 dB and the spectrum level of the highpass noise was decreased by about 5 dB, the rapid fall in nSFOAE magnitude was not observed; rather, the nSFOAE was approximately equal to that seen for tone-alone or tone-plus-bandpass-noise. These facts suggest that the weakening of the nSFOAE response with highpass noise depends upon the relative levels of the tone and noise. We believe that this immediate decline represents a form of lateral (or two-tone) suppression (Kemp and Chum, 1980; Shannon, 1976) operating from the highpass noise to the tone; additional evidence is presented in Sec. 3E.
Note that the failure of our bandpass noise to produce a rising, dynamic response surely was dependent, in part, upon the relatively narrow bandwidth of that noise. Bandpass noises of greater bandwidth than used here would activate the low-frequency region that clearly has considerable power to initiate a rising, dynamic response (see Fig. 6). In accord with this reasoning, Lilaonitkul and Guinan (2009a) reported that noise centered on their tone was highly effective at triggering their SFOAE measure, unlike the result here, but their noise had a bandwidth of one-half octave. When they used noises of half-octave bandwidth centered either below or above the tone, both were capable of initiating an SFOAE response but the lower noise band was the more effective. That asymmetry is similar to the present results except that our highpass noise produced either a suppressive effect or no change in the nSFOAE response from its strength for tone-alone, depending upon the individual subject and the levels of the tone and the highpass noise.
Supplementary bandwidth-manipulation measurements were made for subject NH using a 2.5‐kHz tone and lowpass, bandpass, and highpass noise bands correspondingly shifted down in frequency. The bandpass noise was 400‐Hz wide, centered on the 2.5‐kHz tone, and the lowpass and highpass noise bands had frequency cutoffs 200 Hz below and above the frequency of the 2.5‐kHz tone, respectively. These nSFOAE responses were analyzed using a 250‐Hz-wide elliptic bandpass filter centered on the frequency of the 2.5‐kHz tone, but otherwise were analyzed the same as were the responses to the 4.0‐kHz tone. The pattern of results was basically the same as in Fig. 6, suggesting that the spectral asymmetry is an effect of relative frequency, not absolute frequency.
The lowpass noise used to collect the data shown in Fig. 6 had a bandwidth of 0.1–3.8 kHz. In supplementary measurements with subject KW, the upper shoulder frequency of the lowpass noise was decreased gradually, thereby increasing the frequency separation between the 4.0‐kHz tone and the shoulder frequency. The cutoff frequencies for the noise were 3.8, 3.5, 3.0, 2.5, and 2.0 kHz, and the attenuation beyond the cutoff frequency was quite steep (because the noise was digitally synthesized using an inverse FFT). The spectrum level of the noise was held constant across these bandwidths. Once the bandwidth of the lowpass noise was 0.1–2.0 kHz, the nSFOAE response ceased to exhibit a rising segment following noise onset and instead looked like the no-rise responses measured with the bandpass and highpass noise bands in Fig. 6.
Contralateral noise
The results in Fig. 6 revealed that the nSFOAE response to tone-plus-highpass-noise could be substantially weaker than the response to tone-alone. One interpretation of this outcome was that the highpass noise failed to activate the mechanism responsible for the rising, dynamic segment of the nSFOAE response, but it did produce lateral (two-tone) suppression (Kemp and Chum, 1980; Shannon, 1976) on the 4.0‐kHz region of the basilar membrane. Because lateral suppression is a monaural mechanism, a test of this interpretation would be to move the noise stimulus to the opposite ear.
In accord with this reasoning, additional nSFOAE data were collected from three subjects using a modified procedure for stimulus presentation. Within each block, half of the triplets were the same as used for most of the conditions described so far in this paper; the tone and noise both were presented simultaneously to the right ear and the nSFOAE was extracted from that ear. The remaining triplets involved presenting only the tone to the right (ipsilateral) ear, in which the nSFOAE was measured, while simultaneously presenting the noise to the left (contralateral) ear. The timing and levels of the tone and noise matched those described so far in this paper.3
For maximal control of the situational and subject variables, the ipsilateral- and contralateral-noise conditions were alternated within each block of triplets. One pair of successive triplets involved ipsilateral noise, the next pair involved contralateral noise, and so on, for the remainder of the block. Within each block of triplets, the noise was either wideband, lowpass, bandpass, or highpass. The nSFOAE responses were stored and analyzed separately for the two modes of noise presentation, and data were collected from at least 30 triplets for both the ipsilateral- and the contralateral-noise conditions in each block. Representative data are shown in Fig. 7.
The data in the top panel of Fig. 7 were obtained from those triplets in which the noise was ipsilateral to the tone, and thus represent a replication of the data for subject JZ in the top panel of Fig. 6 collected with the same sample of noise. The two sets of results are similar in that both the wideband and the lowpass noises produced rising, dynamic responses and the bandpass noise did not. Also evident in both data sets is the immediate and sustained diminution in nSFOAE magnitude produced by the highpass noise that we believe to be attributable to lateral suppression.
The data in the bottom panel of Fig. 7 were collected using noises contralateral to the nSFOAE measurement ear, which received just the tone. As can be seen, the wideband and lowpass noises again produced rising, dynamic responses for the tone (in the ipsilateral, tone-alone ear), and the bandpass and lowpass noises both led to no change in the nSFOAE response from the initial tone-alone interval. This suggests that the sustained, weakened response to ipsilateral highpass noise (top panel of Fig. 7) may have been attributable to lateral suppression (Kemp and Chum, 1980; Shannon, 1976); additional data collection on this point is in progress. Contralateral-noise presentations also were used with subjects NH and KW, and the outcomes were the same as shown in the bottom panel of Fig. 7. Guinan (2006) also demonstrated the essential equivalence of ipsilateral and contralateral noises on the dynamic behavior of SFOAEs. Note in Fig. 7 that the hesitation between noise onset and the beginning of the rising, dynamic response was slightly longer for the contralateral noises than for the ipsilateral noises, but this was not true for all subjects.
The realization that lateral suppression operates from a higher-frequency region onto the 4.0‐kHz region of our nSFOAE measurement explains the momentary declines in nSFOAE magnitude often seen immediately after the onset of a wideband noise (see Figs. 56 and below). These declines can be attributed to the fast action of lateral suppression that is then gradually offset by an opposing mechanism triggered by the wideband noise.
Graded nSFOAE responses
When only the tone or only the wideband noise was presented, the nSFOAE responses were graded in magnitude according to the level of the tone or noise (Fig. 4). When tone-plus-noise was presented, the magnitude of the asymptotic nSFOAE response could be substantially larger than to either tone-alone or noise-alone, and the response also differed dynamically from the responses to tone-alone and noise-alone (Fig. 5). The contributions of stimulus level to nSFOAE magnitude were further explored in an attempt to gain further insight into the underlying mechanisms.
For one set of measurements, the level of the wideband noise was held constant (at 25 dB spectrum level) while the level of the tone was manipulated across blocks of triplets. The result was that the asymptotic magnitudes of the rising nSFOAE response also were graded according to the level of the tone (see top panel of Fig. 8). It is important to realize that, in these tone-plus-noise conditions, there were nSFOAE responses being produced to both the tone and the noise, and the output of the 400‐Hz analysis filter contained both nSFOAE components. As the tone level was decreased, the relative contribution of the noise to the overall nSFOAE began to predominate, and accordingly the responses at 40 and 50 dB became more variable and also more similar in magnitude both to each other and to the noise-alone responses seen in Figs. 23.
For another set of measurements, the level of the tone was held constant (at 60 dB SPL) while the level of the wideband noise was manipulated across blocks of triplets. The result was that the maximum magnitude of the nSFOAE response changed relatively little over a 20‐dB change in the spectrum level of the wideband noise (see bottom panel of Fig. 8). Note that for the highest noise level tested, the rising, dynamic response disappeared in this subject; that is, the response to the tone-plus-noise was nearly constant throughout the 400‐ms duration of tone-plus-noise. This absence of a rising, dynamic response with the 35‐dB noise level was seen for other subjects as well, even when the level of the tone was raised to 66 dB in the 35‐dB noise condition. [Note that the nSFOAE response to noise-alone (Fig. 2) also was essentially constant once the spectrum level of the noise was raised to 35 dB.] Some subjects did show a rising, dynamic response with the 15‐dB noise level (like NH in Fig. 8), but other subjects showed an essentially flat nSFOAE, resembling the response to tone-alone.
For both of the manipulations in Fig. 8, it is important to remember that the nSFOAE response is the magnitude of the waveform required to make the sound in the ear canal during the two-earphone presentation equal to the linear sum of the sounds from the two single-earphone presentations. The top panel of Fig. 8 shows that this “correction waveform” (the nSFOAE) must increase in magnitude as the level of the tone is increased, and the bottom panel of Fig. 8 shows that the correction waveform is not much affected by the level of the noise. A way to think about these effects in terms of the input∕output functions of the cochlea is described below and in the second paper in this series.n1
Recovery following noise termination
Because the nSFOAE to tone-plus-noise is strong (Fig. 5) and the nSFOAE to tone-alone is weak (Fig. 1), it is logical to expect that if the tone were to outlast the duration of the noise, then the nSFOAE to the tone should begin to weaken, perhaps rapidly, and move toward the response magnitude seen with tone-alone. A test of that expectation proved it wrong. For all subjects, after the 400‐ms noise was terminated, the nSFOAE response to a 60‐dB tone remained strong, and approximately the same magnitude as during the noise, with the return to tone-alone values requiring hundreds of milliseconds after noise termination. To further explore this persistence of the nSFOAE response following termination of the noise, we varied the duration of the noise from 25 to 200 ms while retaining the duration of the tone at 500 ms. The results are shown in Fig. 9.
As Fig. 9 reveals, when the wideband noise lasted 100 or 200 ms, the elevated nSFOAE response to the tone persisted unabated through the remainder of the 400‐ms observation period, similar to the persistence seen when the duration of the noise was 400 ms and the tone outlasted it (not shown). When the noise duration was shortened to 50 or 25 ms, the nSFOAE response to the tone did decline somewhat, but it still was elevated slightly at the end of the 500‐ms tone when compared with the initial tone-alone period. For some other subjects, the 25‐ms noise produced a response to the tone that did decline to the value seen in the initial tone-alone period by the end of the 500‐ms observation period. Thus, there are conditions for which the nSFOAE response to the tone does decline gradually following the termination of the noise. However, the protracted persistence in the other conditions surely is more notable than the fact that in extreme conditions, the response does recover in accord with intuition. Apparently the wideband noise (or at least its low-frequency region) is necessary to activate the mechanism responsible for the gradual rise in the nSFOAE response to the tone-plus-noise (tone-alone is inadequate to activate that mechanism; see Fig. 1), but once the mechanism has been activated, the nSFOAE response can persist for hundreds of milliseconds. When straight lines were fitted to the segments of the responses between 225 and 500 ms in Fig. 9, the slopes were about 1 dB per 100 ms for both the 25- and 50‐ms noises.
In passing we note that the responses during the tone-alone periods that preceded noise onset were very similar in magnitude for all noise durations in Fig. 9 (and in other subjects). This suggests that the 500‐ms silent intervals between successive stimulus presentations in each triplet were adequate to allow recovery of whatever physiological mechanism is responsible for the rising, dynamic response to the tone-plus-noise.
Notice in Fig. 9 that a 25‐ms burst of noise was adequate to initiate an increase in magnitude of the nSFOAE response even though that noise duration did not exceed the approximate duration of the hesitation present in all the responses shown in Fig. 9. That is, once triggered, the rising nSFOAE response takes time to develop.
Short vs long tone durations
All of the data shown so far were collected using tones having durations of 400 ms or longer. Data also were collected using 10‐ms tones presented at different delays relative to the onset of the noise. One basic question was whether probe tones of this sort would be adequate to reveal the functional relationships shown in Figs. 56. The basic answer was yes; whatever mechanisms are activated by the presentation of the noise, they produced nSFOAE responses to short tones that were highly similar to those for long tones, both in magnitude and time course. This is further evidence that the relatively intense, long-duration tone used in all preceding demonstrations contributed little or nothing to the initiation of the dynamic response to that tone.
A comparison of short- and long-duration tones is provided in Fig. 10. The close agreement between the responses is representative of the data for other subjects. Although not shown, short tones also behaved similarly to a long tone in the time period following the offset of the wideband noise (see Fig. 9 above). Thus, short tones apparently can be used safely as probes of the state of the dynamic process illustrated in Figs. 5689. For the purposes of Fig. 10, the duration of the rectangular analysis window was shortened from 20 to 10 ms in order to match the duration of the 10‐ms tone bursts. This brief window explains the higher variability in the long-tone data than was present in Fig. 5, 6, or 8.
DISCUSSION
The purpose of this report was to describe the results obtained when a new, nonlinear procedure for measuring SFOAEs was used with stimuli of the sort commonly employed in psychoacoustical studies of such phenomena as simultaneous and forward masking. In a companion paper,n1 nSFOAEs and behavioral performance were measured using the same acoustic waveforms, the same subjects, and the same ears.
To summarize the results of this report, when the stimulus was a 4.0‐kHz test tone presented alone, the nSFOAE response increased monotonically in magnitude as tone level was increased, but for all levels, the nSFOAE magnitude was essentially constant throughout the 400‐ms duration of the tone (see Fig. 1). When the stimulus was a wideband noise alone, the magnitude of the response again varied directly with level, but the form of the response differed depending upon noise level. At moderate noise levels for most subjects, there was a gradual increase in the nSFOAE response followed by a segment with essentially constant magnitude (see Figs. 23). When the tone and the wideband noise were presented simultaneously, the result seemingly was synergistic in that the maximum magnitude of the nSFOAE response was much greater than would be expected from simple additivity of the responses to tone-alone and noise-alone (see Fig. 5). This dynamic response showed an initial hesitation of about 25 ms followed by a rapid rise in the response that required about 100 ms to complete; then the nSFOAE response was approximately constant until the end of the 400‐ms noise presentation (see Figs. 56810). During the hesitation period, the nSFOAE often showed an immediate, sharp decline in magnitude that was not present when the noise was only in the contralateral ear or when an ipsilateral noise was lower in frequency than the tone, suggesting that the sharp decline is attributable to lateral suppression operating from a higher frequency region on to the tone. The contralateral noise did produce a rising, dynamic response to the ipsilateral tone like that seen with an ipsilateral noise.
Subsequent manipulations revealed that it was primarily the low-frequency components of the noise that were responsible for the rising, dynamic segment of the nSFOAE response; highpass and bandpass versions of the noise were incapable of producing the dynamic response, but rather left the magnitude of the nSFOAE close to or weaker than the level of response to the tone alone (see Fig. 6). Changing the tone to 2.5 kHz, and changing the cutoff frequencies for the various noise bands accordingly, demonstrated that these bandwidth-manipulation outcomes were a matter of relative not absolute frequency. When the tone was continued past the termination of the noise, the nSFOAE response remained elevated in magnitude for hundreds of milliseconds (see Fig. 9). Finally, when the long-duration tone was replaced by a 10‐ms tone burst during or after the noise presentation, the nSFOAE responses were essentially identical to those observed with the long-duration tone (see Fig. 10).
Comparison of SFOAE methods
The fact that the dynamic nSFOAE response to tone-plus-noise shows a 25‐ms hesitation at noise onset and is well characterized by a short time constant suggests that this measure is strongly related to the similar OAE measures long studied by Guinan and his colleagues (e.g., Guinan, 2006; Guinan et al., 2003; Backus and Guinan, 2006; Lilaonitkul and Guinan, 2009a, 2009b). Thus, the rising, dynamic segment of the nSFOAE response likely is attributable to the medial olivocochlear (MOC) reflex that Guinan so elegantly demonstrated to underlie the dynamic response he saw in a version of the SFOAE extracted using a heterodyne procedure. Although Guinan typically presented the noise to the contralateral ear, typically used a diotic 1.0‐kHz tone, and did not employ a triplet procedure, among other differences from our procedure, nevertheless he did extract a perstimulatory SFOAE from the sound in the ear canal, so the existence of similarities across the two data sets is encouraging. The similarities can be summarized as follows:
Both procedures yield a constant response to a tone presented alone (which Guinan called the SFOAE), and a rising response to tone-plus-noise (which Guinan called the ΔSFOAE; see Guinan et al., 2003; Backus and Guinan, 2006). [The numerous differences in procedure have led us not to adopt the Guinan terminology for fear of possibly confusing (clearly related) phenomena.]
After the onset of an ipsilateral noise, there is a hesitation of about 25 ms before the beginning of the rising, dynamic response (see Figs. 569; Backus and Guinan, 2006).
Although Guinan and colleagues (e.g., Guinan, 2006; Backus and Guinan, 2006) were able to extract multiple time constants from their measures, presumably attributable to multiple underlying mechanisms, the ears with the strongest dynamic responses did reveal an initial, fast time constant much like those shown by our subjects. The fast time constants reported by Guinan and colleagues were about 60–80 ms; ours were about 23–39 ms when the exponential function was fitted to the response obtained with our typical 20‐ms analysis window and about 70–90 ms when fitted to the response obtained with a 40‐ms window. By comparison, Backus and Guinan (2006) noted that a time constant of about 70 ms emerges from a wide array of different measures of efferent activity and thus may represent a “fundamental time constant” of the MOC system.
When a sample of typical nSFOAE responses was analyzed with a heterodyne procedure similar to Guinan’s, the form of the data, including the hesitation, the rising, dynamic response, the magnitude of the rise, and the fitted time constants were remarkably similar to the values obtained with our standard analysis procedure.
The nSFOAE response sometimes shows large variation across test sessions (described below), and Lilaonitkul and Guinan (2009a) reported the same.
There also are some differences in the outcomes obtained with Guinan’s and our procedures:
According to Guinan (2006, p. 601), when the bandwidth of the noise was manipulated, significant MOC effects could be observed (at least “in some cases”) with noise bands located either well below or well above their 1.0‐kHz tone. Lilaonitkul and Guinan (2009a) also reported that lowpass noise was more effective than highpass noise at producing a dynamic change in their SFOAE measure, but the most effective noises were those centered on the test tone. By comparison, both our bandpass and highpass noises were ineffective at triggering a rising, dynamic response. Some of this difference surely is attributable to Lilaonitkul and Guinan’s use of half-octave bandwidths compared to the narrower bandpass noise used here, but the failure of our highpass noise to initiate a rising, dynamic response is unlikely to be simply a bandwidth effect.
During the hesitation period prior to the beginning of the rising, dynamic response, the nSFOAE sometimes shows an immediate, sharp decline following noise onset (see Figs. 569). Multiple lines of evidence suggest that this decline is attributable to lateral (two-tone) suppression acting on the tone from components of the noise higher in frequency than the tone (see Figs. 67). Guinan and colleagues took steps to avoid complications from suppression in their measurements (e.g., Lilaonitkul and Guinan, 2009a, 2009b). Commonly they employ a tonal suppressor on the low-frequency side of their test tone, and their measurement window is located after the suppression has declined away. In other studies, for their ipsilateral conditions, their MOC-eliciting noises exclude that band of frequencies surrounding their tone that are capable of producing suppression on their tone. For a 1.0‐kHz target tone, Backus and Guinan (2006) estimated that the spectral notch had to be about 2.1 octaves.
Although Guinan and colleagues reported similar values of hesitation for ipsilateral- and contralateral-noise bands (Backus and Guinan, 2006), the hesitation for the nSFOAE response was longer for contralateral- than ipsilateral-noise bands for some of our subjects (see Fig. 7).
Following noise termination, the Guinan response decays relatively rapidly (Backus and Guinan, 2006), unlike the marked persistence seen for the nSFOAE response (see Fig. 8). Note that in primary auditory nerve fibers, the onset of efferent activity also is considerably faster than is the offset of efferent activity (Wiederhold and Kiang, 1970).
The dynamic response observed by Backus and Guinan (2006, Fig. 5) increased in magnitude with increases in noise level, whereas for the nSFOAE, the level of the noise appears to matter less than the level of the tone (compare top and bottom panels of Fig. 8).
We have yet to see any subjects having the medium time constants (about 330 ms) reported by Backus and Guinan (2006).
Our procedure for obtaining nSFOAEs has several potential advantages compared to Guinan’s measure: when the noise is ipsilateral, it does not need to be notched around the frequency of the tone; short-duration tones can be used as easily as long-duration tones; and the sound of interest can be more complex than a tone. Also, the nSFOAE procedure allows an investigator easily to work in whatever frequency region provides the best signal-to-noise ratio. A potential weakness of the nSFOAE measure is that it is dependent upon the behavior of the SFOAE being nonlinear. Our procedure eliminates the stimulus waveform plus all components of the cochlear response that are linear and leaves only those components not obeying strict additivity. Thus, our procedure reveals properties of the behavior of the underlying SFOAE (e.g., in response to changes in tone and∕or noise level), as opposed to the actual magnitude of the underlying SFOAE, which is revealed only indirectly. These characteristics of the nSFOAE response may be an advantage in situations where the linear response obscures a phenomenon of interest, but they may be a disadvantage in situations where the actual magnitude of the underlying SFOAE is relevant.
The procedure used here also has marked similarities and differences with the procedure used by Keefe et al. (2009), but a detailed comparison is deferred until the second paper in this series.n1
Effects of noise on OAEs
Other investigators also have studied the effect of noise bands of various sorts on an OAE produced by a tone. Unlike the present study, Maison et al. (2001) used a transient-evoked OAE (TEOAE), their noise bands were in the ear contralateral to the tone, the effect observed was a diminution in the magnitude of a TEOAE, and those diminutions typically were only 1 dB or less in magnitude. Also unlike the present study, the noise bands producing the greatest attenuation of the TEOAE were those centered on the frequency of the tone (either 1.0 or 2.0 kHz), with little evidence of an asymmetry in the effectiveness of noise bands below or above the tone frequency. Apparently, TEOAEs are not affected by noise in the same way as the perstimulatory nSFOAE described here. The time course of this contralateral attenuation was about 60 ms (Maison et al., 2001).
Both Kim et al. (2001) and Bassim et al. (2003) measured decreases of about 1–2 dB in a distortion-product OAE when noise was presented to the contralateral ear (this effect is much weaker in humans than in other mammalian species; see discussion by Guinan, 2006). The time constants fitted to those declines showed large individual differences, but generally were about 70 ms, which is similar to the time constants reported by Backus and Guinan (2006) and to ours when a 40‐ms analysis window was used (see above). Other work involving the effects of noise on OAEs is discussed in the companion paper.n1
Middle-ear reflex
The parameters for the tones and noise used here were chosen in large part because of our eventual interest in certain psychoacoustical phenomena such as auditory masking.n1 As a result, the noise levels used may have been sufficient to activate the MER in some subjects. In the most commonly used condition, the noise levels were 63 and 69 dB SPL for the single-earphone and two-earphone presentations of each triplet, respectively, the larger of which is close to the nominal MER threshold for broadband noise (Wilson and Margolis, 1999). In adult ears, Shairer et al. (2007) found that the range of broadband noise levels capable of eliciting a statistically significant change in acoustic admittance in the ear canal was 64–80 dB Hearing Level. These facts make it logically possible that some or all of the effects reported here were attributable to the MER rather than to cochlear mechanisms and the MOC system.
We are not unequivocally able to rule out a potential influence of the MER on our nSFOAE measures, but several facts suggest that those influences were minimal. (1) The MER primarily acts to attenuate the transmission of low frequencies through the middle ear (Dallos, 1973; Goodman and Keefe, 2006; Shairer et al., 2007). In the present study, the frequency of the probe tone and the frequency at which the nSFOAE was measured was 4.0 kHz. Shairer et al. (2007) found that above 2.0 kHz, shifts in acoustic admittance were not significantly different from zero for any level of noise activator. Also, shifts in acoustic reflectance from the tympanic membrane were positive—more energy reflected back into the ear canal—only for frequencies below 1.26 kHz. For frequencies between 1.26 and 5.0 kHz, the shift in reflectance was negative, implying that the transmission of these frequencies improved following a broadband noise activator. (2) Higher frequencies are more effective elicitors of the MER compared to lower frequencies (Dallos, 1973). When the bandwidth of the noise was manipulated here, the high-frequency components of the noise were found not to be effective elicitors of the dynamic tone-plus-noise response, while the low-frequency components were effective elicitors. This clearly is a reversal of the expectations if the rising, dynamic segment of our nSFOAE response were attributable primarily to the MER. (3) The onset latency of the MER is commonly taken to be about 100 ms for a strong stimulus, with shorter latencies at even higher levels (Dallos, 1973; Church and Cudahy, 1984). Goodman and Keefe (2006) measured SFOAEs to a high-frequency probe tone while simultaneously measuring the middle-ear reflex using a low-frequency probe tone. They classified a shift in response to their low-frequency probe as having an MER origin (i.e., noncochlear) only if the onset latency was ≥70 ms. By comparison, the rising, dynamic segment of our nSFOAE response begins well before 70–100 ms, and the time constants we obtained were similar to those reported by Backus and Guinan (2006) and Kim et al. (2001), both of whom took rigorous steps to rule out contributions from the MER in their investigations of the temporal characteristics of the MOC response. (4) The change in acoustic impedance attributable to the MER increases as the duration of the eliciting stimulus increases, at least up to approximately half a second (Church and Cudahy, 1984). By comparison, our nSFOAE response to tone-plus-noise does asymptote after about 100 ms even when the durations of the tone and noise were as long as half a second. (5) Goodman and Keefe (2006) showed that when the MER was observable at a low frequency, there sometimes was a coincident rapid shift in the magnitude of their high-frequency SFOAE that followed approximately the same time course as the MER. In contrast, once our nSFOAE response had asymptoted, no additional rapid changes in response magnitude ever were observed. (6) In subjects NH and KW, a 66‐dB tone was varied in 5‐Hz steps around 4.0 kHz in the presence of a wideband noise of 25 dB spectrum level. When the phase of the nSFOAE response could be estimated reliably using a heterodyne procedure similar to Guinan’s, the phase did shift systematically with frequency. Guinan used outcomes of this sort to rule out strong contributions from the middle-ear reflex on the dynamic component of his SFOAE response (see Guinan et al., 2003). (7) Using a procedure they regarded to be superior to Guinan’s measure of phase shift with frequency, Keefe et al. (2009) reported no evidence of an MER with noises having overall levels of about 69 and 81 dB compared to the 63 and 69 dB typically used here. (8) In one subject, clear evidence of activation of the MER was observed using noise levels comparable to those reported here; specifically, her nSFOAE to tone-plus-noise oscillated unlike anything seen in the responses of the subjects reported here. This individual difference led us to exclude this subject from further measurements.
Taken together, these facts suggest that the MER was not the primary source of the rising, dynamic response reported here for the tone-plus-noise conditions. However, the work of Goodman and Keefe (2006) suggests caution with interpretation until additional tests are developed to rule out MER effects completely.
MOC system and cochlear input∕output functions
Following the lead of von Klitzing and Kohlrausch (1994) and Strickland (2004), these results can be explained by assuming that the MOC system operates to modulate the gain of the cochlear amplifiers and hence the form of the input∕output function of the cochlea (also see Oxenham and Bacon, 2004). The presumption is that, in its initial, resting state, the gain of the cochlear-amplifier system is high and the relevant input∕output function is highly compressive for the middle range of sound-pressure levels. When only a single tone is turned on, there is no activation of the MOC system, and thus no change in gain or in the amount of compression; for tone-alone, the initial input∕output function is relevant for all three presentations of each triplet. When the tone is accompanied by a noise, such as the wideband and lowpass noises used here, some mechanism is activated (presumably the MOC efferent system; see Guinan et al., 2003; Guinan, 2006) that, after a short delay, leads to a decrease in the gain of the cochlear amplifiers and an accompanying shift toward input∕output functions that are less compressive than the initial function. The two single-earphone presentations of each triplet for tone-plus-noise, being acoustically equivalent, are processed by the same input∕output function equally, but the final, two-earphone presentation of each triplet is processed by an input∕output function that has slightly lower gain and is slightly less compressive. The result is that, after a short hesitation, the nSFOAE response for tone-plus-noise shows the characteristic dynamic rise seen in Fig. 5 (and in the data of Backus and Guinan, 2006), and that response remains high until the MOC system gradually relaxes (following stimulus offset) and allows the cochlear gain to return to its initial state. This mechanism is discussed in more detail in the companion paper.n1
Our bandwidth-manipulation results revealed an important point that deserves mention in the context of this MOC explanation of our nSFOAE outcomes. For both 4.0- and 2.5‐kHz tones, and for both ipsilateral and contralateral noises, the bandpass noise and the highpass noise were not capable of activating the rising, dynamic response, but the lowpass noise was. This suggests that the rising, dynamic segment of the nSFOAE response is initiated and controlled by a mechanism that operates relatively locally along the basilar membrane. Namely, each local frequency region apparently is controlled by a frequency region lying just below (apical to) it. The width of that lower-frequency region has yet to be defined, but it appears to be relatively wide, and the width is likely to depend upon the absolute frequency. A wideband sound apparently is capable of activating all those local mechanisms, but each mechanism only controls activity in a frequency region lying slightly above (basal to) it. Even a relatively wide noise band apparently is incapable of initiating a rising, dynamic response in a frequency region below that band. These facts clearly carry strong implications about the wiring of the underlying mechanism, be it the MOC system or some other mechanism. Also note that, when a wideband noise is presented with a tone, there seem to be two opposing forces acting simultaneously on the frequency region containing the tone. Frequency components lying above the tone exert a fast-acting suppressive effect on the frequency region of the tone, while frequency components lying below the tone act more slowly to initiate a rising, dynamic response in the frequency region of the tone (see Fig. 6). The interaction of these opposing forces is not simple, however. If a simple algebraic summation were at work, then one would expect the nSFOAE response with the lowpass noise to be stronger than that with the wideband noise, and that did not happen (see Fig. 6). An understanding of this interaction awaits further work.
Weaknesses of the nSFOAE measure
Although the stimulus procedure and extraction method described here has produced generally stable and consistent nSFOAE responses, occasional irregularities have been observed. For example, on one occasion, at the end of a long test session, subject SC suddenly showed no rising, dynamic response to the tone-plus-noise stimulus even though she routinely had exhibited that response previously, including earlier that same session. On this occasion, various checks on the apparatus and the procedures revealed nothing awry, and in subsequent sessions, SC’s dynamic response again was strong and consistent. In another case, subject JZ was clearly drowsy one day, and his rising, dynamic response was noticeably weaker than typical for him. After a break that included walking and hydration, his dynamic response returned to its typical strength (data shown below). In yet another case, subject KW exhibited no rising, dynamic response to tone-plus-noise when tested soon after participating in a fatiguing sports activity, yet his response returned to its typical value in subsequent sessions. These examples suggest that fatigue of physiological, cognitive, or cochlear origin can alter the mechanisms responsible for the typical cochlear response to tone-plus-noise. These examples also point out a strength of the repeated-measures strategy used in this study; without knowledge from prior and subsequent test sessions, incorrect conclusions surely would have been drawn from these anomalous episodes.
The magnitude of the nSFOAE measured to the same stimulus also can vary considerably, both within and across test sessions. Figure 11 contains seven nSFOAE responses to tone-plus-noise collected from the same ear over four test sessions. As can be seen, the rising, dynamic segments initiated by the onset of the noise differed in maximum magnitude (although not much in shape). For this subject, the standard deviation of nSFOAE magnitude across tests was similar at the 75- and 300‐ms points following noise onset, suggesting that response magnitude simply differed by a constant across tests. When time constants were estimated for the six gray traces at the top of Fig. 11, the range was 17.2–25.4 ms, and the mean value was 22.3 ms.
Also evident in Fig. 11 is that repeated measurements of the response to the initial 100 ms of tone-alone can be quite similar in strength across tests and sessions, unlike the variation seen for the dynamic segment of the nSFOAE response. Concretely, the standard deviation across the seven tests was about 0.6 dB for the response to tone-alone. Occasional exceptions to this regularity for tone-alone also have been observed (e.g., bottom panel of Fig. 8), but when those nSFOAE responses were normalized to the typical magnitude seen for the tone-alone stimulus, the shape of the response was similar to that seen in more typical sessions. Although there surely were small differences in the placement of the probe tip across these test runs showing variability, the level of the stimuli should have been affected relatively little because stimulus level was adjusted in the ear canal before every test run.
Also shown in Fig. 11, but not included in the summary statistics provided, is the episode described above in which subject JZ was tested while drowsy and then again after a brief, brisk walk and hydration. Note that subject JZ’s nSFOAE response to tone-alone was affected much less than his response to tone-plus-noise during his drowsy test session. These examples bolster the implication that the mechanisms responsible for the rising, dynamic segment of the nSFOAE response appear to be affected by everyday fluctuations in alertness or physical fatigue. Guinan and his colleagues also commented about large differences across individuals and test sessions with their SFOAE measure (e.g., Lilaonitkul and Guinan, 2009a).
As mentioned above, one weakness of our nSFOAE measure is that it exists only when a nonlinear component is present in the SFOAE. Also, while our nSFOAE measure does reveal the existence of a nonlinear component in the SFOAE (a failure of simple additivity), it does not reveal the origin of that nonlinear component. For example, the same magnitude of nSFOAE response could arise from a compressive process, as described above, or from an expansive one.
Final comment
It is likely that some auditory psychophysical tasks are heavily affected by the initial responses in the cochlea and other psychophysical tasks less so. Also, some characteristics of particular psychophysical tasks might be determined in the cochlea and other characteristics might be determined at higher neural levels. Having some simple, noninvasive measures that are capable of helping investigators determine when the initial cochlear responses are crucial and when they are not obviously would be extremely valuable for both basic and clinical auditory research. At this point it is unclear whether the perstimulatory nSFOAE measure described here will prove to have such value. In this report we have described the basic characteristics of the nSFOAE response as some fundamental parameters of the acoustic stimulus were manipulated. In a companion paper,n1 we report that the nSFOAE response does behave in accord with psychophysical performance in certain auditory masking conditions, but it also behaves contrary to psychophysical performance in some other regards. The interim judgment, then, appears to be that the glass is half full. Whether that is a good thing or an irrelevant thing, only more research can tell.
ACKNOWLEDGMENTS
This work was supported by a research grant awarded to D.M. by the National Institute on Deafness and other Communication Disorders (NIDCD 00153). K.P.W. conducted this and additional research on this topic while working on a Master’s degree at The University of Texas (Walsh, 2009). Early stages of the work were reported (Walsh et al., 2008, 2009). The work profited greatly from discussions with Dr. C. A. Champlin, Dr. E. A. Strickland, Dr. M. Wojtczak, and Dr. N. F. Viemeister. Comments from two anonymous reviewers are gratefully acknowledged.
Footnotes
A companion paper that serves as the second paper in this series has been submitted to Hearing Research and is under revision (“Overshoot measured physiologically and psychophysically in the same human ears”). The interested reader is referred to this paper for detailed comparisons between our nSFOAE measure and performance on some common psychoacoustical tasks.
Initially the sound-pressure levels of the stimuli in the ear canal were estimated and adjusted using just the ER-10A microphone to monitor the outputs of the ER-2 earphones, and an entire set of data was collected using that procedure. Later use of the Zwislocki coupler revealed that the noise spectrum was not as flat as desired and the tones were stronger than believed. The uncorrected noise spectrum “drooped” from the target value between about 1.5 and 6.5 kHz, with the maximum deviation being about −4 dB in the range from about 3.5 to 4.5 kHz. All of the data were re-collected with those errors in level corrected, and all basic outcomes were confirmed. This reveals that these findings are not critically dependent upon the levels of the sounds; also, every outcome shown is a verification of an earlier, but unshown, measurement in that same subject.
Noise was presented contralaterally by passing the synthesized noise waveform to a second digital-to-analog converter board (PCI-4451, National Instruments, Austin, TX) installed in the G4 computer. The analog output was amplified and delivered to a third ER-2 insert earphone that was fitted into the subject’s left (contralateral) ear canal using an ER3-14A foam eartip. The gain and frequency response of this system was measured with the coupler and sound-level meter described above. Prior to each test run, the level of the noise was set using the calibration factor measured in the left ear. The wideband, lowpass, bandpass, and highpass noises differed in overall level by less than 1 dB for the ipsilateral and contralateral presentations.
References
- Backus, B. C., and Guinan, J. J., Jr. (2006). “Time-course of the human medial olivocochlear reflex,” J. Acoust. Soc. Am. 119, 2889–2904. 10.1121/1.2169918 [DOI] [PubMed] [Google Scholar]
- Bassim, M. K., Miller, R. L., Buss, E., and Smith, D. W. (2003). “Rapid adaptation of the 2f1-f2 DPOAE in humans: Binaural and contralateral stimulation effects,” Hear. Res. 182, 140–152. 10.1016/S0378-5955(03)00190-4 [DOI] [PubMed] [Google Scholar]
- Church, G. T., and Cudahy, E. A. (1984). “The time course of the acoustic reflex,” Ear Hear. 5, 235–242. 10.1097/00003446-198407000-00008 [DOI] [PubMed] [Google Scholar]
- Dallmayr, C. (1987). “Stationary and dynamical properties of simultaneous evoked otoacoustic emissions (SEOAE),” Acustica 63, 243–255. [Google Scholar]
- Dallos, P. (1973). The Auditory Periphery: Biophysics and Physiology (Academic, New York: ). [Google Scholar]
- Goodman, S. S., and Keefe, D. H. (2006). “Simultaneous measurement of noise-activated middle-ear muscle reflex and stimulus frequency otoacoustic emissions,” J. Assoc. Res. Otolaryngol. 7, 125–139. 10.1007/s10162-006-0028-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guinan, J. J., Jr. (2006). “Olivocochlear efferents: Anatomy, physiology, function, and the measurement of efferent effects in humans,” Ear Hear. 27, 589–607. 10.1097/01.aud.0000240507.83072.e7 [DOI] [PubMed] [Google Scholar]
- Guinan, J. J., Jr., Backus, B. C., Lilaonitkul, W., and Aharonson, V. (2003). “Medial olivocochlear efferent reflex in humans: Otoacoustic emission (OAE) measurement issues and the advantages of stimulus frequency OAEs,” J. Assoc. Res. Otolaryngol. 4, 521–540. 10.1007/s10162-002-3037-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keefe, D. H. (1998). “Double-evoked otoacoustic emissions. I. Measurement theory and nonlinear coherence,” J. Acoust. Soc. Am. 103, 3489–3498. 10.1121/1.423057 [DOI] [PubMed] [Google Scholar]
- Keefe, D. H., Schairer, K. S., Ellison, J. C., Fitzpatrick, D. F., and Jesteadt, W. (2009). “Use of stimulus-frequency otoacoustic emissions to investigate efferent and cochlear contributions to temporal overshoot,” J. Acoust. Soc. Am. 125, 1595–1604. 10.1121/1.3068443 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kemp, D. T. (1980). “Towards a model for the origin of cochlear echoes,” Hear. Res. 2, 533–548. 10.1016/0378-5955(80)90091-X [DOI] [PubMed] [Google Scholar]
- Kemp, D. T., and Chum, R. A. (1980). “Observations on the generator mechanism of stimulus frequency acoustic emissions—Two tone suppression,” in Psychophysical, Physiological, and Behavioral Studies in Hearing, edited by van den Brink G. and Bilsen F. A. (Delft University, Delft, The Netherlands: ), pp. 34–42. [Google Scholar]
- Kim, D. O., Dorn, P. A., Neely, S. T., and Gorga, M. P. (2001). “Adaptation of distortion product otoacoustic emission in humans,” J. Assoc. Res. Otolaryngol. 2, 31–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lilaonitkul, W., and Guinan, J. J., Jr. (2009a). “Reflex control of the human inner ear: A half-octave offset in medial efferent feedback that is consistent with an efferent role in the control of masking,” J. Neurophysiol. 101, 1394–1406. 10.1152/jn.90925.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lilaonitkul, W., and Guinan, J. J., Jr. (2009b). “Human medial olivocochlear reflex: Effects as functions of contralateral, ipsilateral, and bilateral elicitor bandwidths,” J. Assoc. Res. Otolaryngol. 10, 459–470. 10.1007/s10162-009-0163-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lonsbury-Martin, B. L., Harris, F. P., Stagner, B. B., Hawkins, M. D., and Martin, G. K. (1990). “Distortion-product emissions in humans: II. Relations to acoustic immittance and stimulus-frequency and spontaneous otoacoustic emissions in normally hearing subjects,” Ann. Otol. Rhinol. Laryngol. Suppl. 147, 15–29. [PubMed] [Google Scholar]
- Maison, S., Durrant, J., Gallineau, C., Micheyl, C., and Collet, L. (2001). “Delay and temporal integration in medial olivocochlear bundle activation in humans,” Ear Hear. 22, 65–74. 10.1097/00003446-200102000-00007 [DOI] [PubMed] [Google Scholar]
- McFadden, D., and Pasanen, E. G. (1994). “Otoacoustic emissions and quinine sulfate,” J. Acoust. Soc. Am. 95, 3460–3474. 10.1121/1.410022 [DOI] [PubMed] [Google Scholar]
- Oxenham, A. J., and Bacon, S. P. (2004). “Psychophysical manifestations of compression: Normal-hearing listeners,” in Compression: From Cochlea to Cochlear Implants, edited by Bacon S. P., Fay R. R., and Popper A. N. (Springer, New York: ), pp. 62–106. [Google Scholar]
- Probst, R., Lonsbury-Martin, B. L., and Martin, G. K. (1991). “A review of otoacoustic emissions,” J. Acoust. Soc. Am. 89, 2027–2067. 10.1121/1.400897 [DOI] [PubMed] [Google Scholar]
- Schairer, K. S., Fitzpatrick, D., and Keefe, D. H. (2003). “Input-output functions for stimulus-frequency otoacoustic emissions in normal-hearing adult ears,” J. Acoust. Soc. Am. 114, 944–966. 10.1121/1.1592799 [DOI] [PubMed] [Google Scholar]
- Schairer, K. S., and Keefe, D. H. (2005). “Simultaneous recording of stimulus-frequency and distortion-product otoacoustic emission input-output functions in adult ears,” J. Acoust. Soc. Am. 117, 818–832. 10.1121/1.1850341 [DOI] [PubMed] [Google Scholar]
- Shairer, K. S., Ellison, J. C., Fitzpatrick, D., and Keefe, D. H. (2006). “Use of stimulus-frequency otoacoustic emission latency and level to investigate cochlear mechanics in human ears,” J. Acoust. Soc. Am. 120, 901–914. 10.1121/1.2214147 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shairer, K. S., Ellison, J. C., Fitzpatrick, D., and Keefe, D. H. (2007). “Wideband ipsilateral measurements of middle-ear muscle reflex thresholds in children and adults,” J. Acoust. Soc. Am. 121, 3607–3616. 10.1121/1.2722213 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shannon, R. V. (1976). “Two-tone unmasking and suppression in a forward-masking situation,” J. Acoust. Soc. Am. 59, 1460–1470. 10.1121/1.381007 [DOI] [PubMed] [Google Scholar]
- Strickland, E. A. (2004). “The temporal effect with notched-noise maskers: Analysis in terms of input-output functions,” J. Acoust. Soc. Am. 115, 2234–2245. 10.1121/1.1691036 [DOI] [PubMed] [Google Scholar]
- von Klitzing, R., and Kohlrausch, A. (1994). “Effect of masker level on overshoot in running- and frozen-noise maskers,” J. Acoust. Soc. Am. 95, 2192–2201. 10.1121/1.408679 [DOI] [PubMed] [Google Scholar]
- Walsh, K. P. (2009). “Psychophysical and physiological measures of dynamic cochlear processing,” Master's thesis, The University of Texas at Austin, Austin, TX. [Google Scholar]
- Walsh, K. P., Pasanen, E. G., and McFadden, D. (2008). “Overshoot measured psychophysically and physiologically in the same ears,” Assoc. Res. Otolaryngol. Abstr. 31, 927. [Google Scholar]
- Walsh, K. P., Pasanen, E. G., and McFadden, D. (2009). “Evidence for dynamic cochlear processing in otoacoustic emissions and behavior (A),” J. Acoust. Soc. Am. 125, 2720. [Google Scholar]
- Whitehead, M. L. (1991). “Slow variations of the amplitude and frequency of spontaneous otoacoustic emissions,” Hear. Res. 53, 269–280. 10.1016/0378-5955(91)90060-M [DOI] [PubMed] [Google Scholar]
- Wiederhold, M. L., and Kiang, N. Y. S. (1970). “Effects of electric stimulation of the crossed olivocochlear bundle on single auditory-nerve fibers in the cat,” J. Acoust. Soc. Am. 48, 950–965. 10.1121/1.1912234 [DOI] [PubMed] [Google Scholar]
- Wilson, R. H., and Margolis, R. H. (1999). “Acoustic-reflex measurements,” in Contemporary Perspectives in Hearing Assessment, edited by Musiek F. E. and Rintelmann W. F. (Allyn and Bacon, Boston, MA: ), pp. 131–166. [Google Scholar]
- Zwicker, E., and Schloth, E. (1984). “Interrelation of different oto-acoustic emissions,” J. Acoust. Soc. Am. 75, 1148–1154. 10.1121/1.390763 [DOI] [PubMed] [Google Scholar]