Abstract
Cochlear mechanics tends to be studied using single-location measurements of intracochlear vibrations in response to acoustical stimuli. Such measurements, due to their invasiveness and often the instability of the animal preparation, are difficult to accomplish and, thus, ideally require stimulus paradigms that are time efficient, flexible, and result in high resolution transfer functions. Here, a swept-sine method is adapted for recordings of basilar membrane impulse responses in mice. The frequency of the stimulus was exponentially swept from low to high (upward) or high to low (downward) at varying rates (from slow to fast) and intensities. The cochlear response to the swept-sine was then convolved with the time-reversed stimulus waveform to obtain first and higher order impulse responses. Slow sweeps of either direction produce cochlear first to third order transfer functions equivalent to those measured with pure tones. Fast upward sweeps, on the other hand, generate impulse responses that typically ring longer, as observed in responses obtained using clicks. The ringing of impulse response in mice was of relatively small amplitude and did not affect the magnitude spectra. It is concluded that swept-sine methods offer flexible and time-efficient alternatives to other approaches for recording cochlear impulse responses.
I. INTRODUCTION
Linear, time-invariant systems are fully characterized by their impulse response (i.e., by their response to an ideal pulse or Dirac delta function) in the time domain or their transfer function in the frequency domain. Although clearly nonlinear, cochlear mechanical transfer functions are typically measured with pure tones rather than pulses or clicks under the assumption of quasi-linear processing at each stimulus level. Whereas to a first approximation, cochlear processing appears to comply with this assumption, it is well established that cochlear impulse responses do systematically differ when measured with dynamic sounds, such as clicks, vs steady-state sounds, such as tones or noise (Recio-Spinoso et al., 2009; Rhode and Recio, 2001; Versteegh and van der Heijden, 2012).1 Here, we explore an alternative stimulus to estimate cochlear impulse responses, a swept-sine, whose frequency changes smoothly over time. The first and higher order (i.e., harmonic distortions) cochlear impulse responses can be extracted in the time domain through the convolution between the measured response to an exponential frequency sweep and the time-reversed stimulus waveform (Farina, 2007). By adjusting the rate of the frequency change, the sweep stimulus can be varied from click-like (fast rate) to tone-like (slow rate), and by adjusting the direction of frequency change (upward vs downward), one can either enhance or reduce the natural cochlear dispersion (Summers et al., 2003). We use volumetric optical coherence tomography and vibrometry (VOCTV) to measure intracochlear vibrations from the basilar membrane (BM) in mice in response to sweep stimuli of varying parameters (level, rate, and direction of frequency change) and compare them to responses measured with commonly used pure tones and clicks. The overall goal of this study is to show that swept-sine method offers greater flexibility for control over stimulus parameters while evoking impulse responses equivalent to those measured with more traditional stimuli.
Cochlear mechanical transfer functions are typically measured with pure tones of varying frequency due to the simplicity of the stimulus and relative ease of the interpretation of the measured responses (e.g., minimized potential for nonlinear interactions in time and/or frequency domain). Presenting one tone at a time, however, can be time-consuming and, therefore, impractical, especially when high resolution data are needed (e.g., for phase unwrapping). One means of expediting data collection is to use a multitone (or “zwuis”) stimulus, which allows one to deliver many frequencies simultaneously while avoiding overlap of higher order distortion products (van der Heijden and Joris, 2003). While multitone stimuli can speed up the data collection, due to the nonlinear nature of the cochlear processing, the intertone interactions (e.g., mutual suppression) cannot be fully controlled for (e.g., linearization of the response). For instance, Versteegh and van der Heijden (2012) showed that as a result of cochlear suppression, the “effective” level of the zwuis complex as compared to tone sound pressure level (SPL) is typically higher by an amount that varies considerably across preparations. Finally, the uneven frequency spacing of the multitone stimulus can create phase unwrapping ambiguities. Another type of complex stimulus used in auditory research (particularly for auditory-nerve recordings) is white noise (Carney and Yin, 1988). Similarly, as in case of the tone-complexes, the effective level of the noise stimulus must be corrected as compared to the pure tone SPL. This method offers a convenient way to extract impulse response and odd-order nonlinearities as the first order cross correlation between the noise and response (the first order Wiener kernel; Recio-Spinoso et al., 2009). Finally, clicks are especially useful because, being punctate and wideband in nature, they permit precise timing of a system's response while simultaneously testing it over a wide range of frequencies. Although the click is the optimal choice of stimulus for measuring the impulse response of a nonlinear system (e.g., Recio et al., 1998), it suffers from several limitations, such as high crest factors and difficulties with spectral calibration (e.g., obtaining a flat spectrum).
The swept-sine method, originally developed for the measurement of room acoustics, has several advantages over other commonly used approaches. As compared to discrete-tone stimuli, the sweep method offers high frequency resolution (accurate calculation of phase gradients) and improved efficiency of data collection (Abdala et al., 2015). Unlike other complex stimuli, such as multitones and clicks, calibration of a sweep stimulus is straightforward (equivalent SPL as tones), and multitone nonlinear interactions should be minimized, especially if the rate of frequency change is relatively slow. Compared to clicks, the energy in the swept stimulus is spread out over time, resulting in considerably lower crest factors and lesser risk of creating distortions in the acoustical system. Finally, the exponential frequency sweep (i.e., a swept-sine whose instantaneous frequency changes exponentially over time) allows one to deconvolve, simultaneously, the linear impulse response of the system and separate responses for each order of harmonic distortion in the time domain in a straightforward manner (Farina, 2000).
The advantages of the swept-tone stimulus have been recognized for recordings of otoacoustic emissions (OAEs), which are faint sounds generated by the cochlea. Because of the small magnitude of OAEs, the recording often requires tedious averaging, thus, time efficiency is of the essence. The desired signal can be extracted from the response to a swept tone via several methods, such as least-square fit (LSF), all-pass inverse filtering in frequency domain, and convolution with reverse stimulus in time domain (e.g., Bennett and Özdamar, 2010; Keefe et al., 2016; Long et al., 2008). In this study, we use the convolution method for measuring cochlear impulse responses with exponential sweeps (Farina, 2000). Whereas the LSF method is convenient and allows for adjustment of analyses parameters post hoc, its major drawback is the need to assume a priori parameters of the extracted signal. Although the all-pass inverse filtering in theory is equivalent to the method of convolution with reverse filter described here, previous implementations of this method differed in the stimulus design and analyses approaches (Keefe et al., 2016).
A. Overview of the swept-sine method
An example of application of the swept-sine method is schematized in Fig. 1. The sweep stimulus, , is designed such that its instantaneous frequency increases (r > 0) or decreases (r < 0) exponentially with time t from f1 to f2 with rate r,
| (1) |
FIG. 1.

An application of the swept-sine method for extracting the first and higher order impulse responses of a broadband nonlinear system. See the text for details. In this example, the sweep is upward (r > 0) and the delays, , are negative (i.e., the higher order impulse responses precede the lower order impulse responses). For a downward sweep (r < 0), the situation is reversed with the higher order responses following the lower order responses.
The rate, r, of the exponential change in frequency is calculated for approximate duration of the stimulus, , as
| (2) |
where the brackets “ ” indicate ceiling to a nearest integer.2 This procedure is introduced to avoid non-synchronization of phases of the higher order impulse responses [i.e., hm(t) for m > 1; Novak et al., 2015].3 The new duration of the stimulus, T, is recalculated as
| (3) |
The new duration of the stimulus is then used to recalculate the sweep waveform [Eqs. (1) and (2)]. A time-frequency representation of a sweep stimulus is shown in Fig. 1(A). Because the instantaneous frequency changes exponentially over time (i.e., sweeps slow/fast at low/high frequencies), the spectrum of the stimulus is pink (i.e., −3 dB/oct; Farina, 2000, 2007).
For each sweep stimulus , its time-reversed replica is derived with an amplitude modulation (i.e., +3 dB/oct) such that convolution (*) between the two gives a Dirac delta function, ,
| (4) |
The signal, , is called “inverse filter” [Fig. 1(B)].
The system's response to the sweep stimulus [ , Fig. 1(C)] is convolved with the inverse response to obtain its transfer function, [Fig. 1(D)],
| (5) |
The transfer function, , can be decomposed into a series of higher order impulse responses, [Fig. 1(E)], based on their time lag, , relative to the first impulse response,
| (6) |
where the delay for the mth impulse response is calculated as
| (7) |
The first order impulse response as well as higher order distortions can now be extracted using a time window based on their respective delays [Figs. 1(F)–1(H)].
II. METHODS
A. Animal preparation
The swept-sine method was tested in vivo in deeply anesthetized adult (∼4–7-week-old) mice (CBA/CaJ) of both sexes (15 mice; 8 male). The animals were assumed to be normally hearing as assessed via recordings of distortion product OAEs and high gain in the BM vibrations evoked with low intensity sweep stimulus. The anesthesia was induced with ketamine (80–100 mg/kg) and xylazine (5–10 mg/kg). Supplemental doses of anesthesia (1/4 of the induction dose) were given throughout the experiment to ensure areflexia. The core body temperature was maintained at ∼37 °C. The head was fixed in a head-holder, and a ventrolateral approach was used to surgically access the left bulla and expose the top of the otic capsule. The left ear-canal was resected and a tip of an acoustic probe (containing a microphone and speakers; ER10X, Etymotic Research, Elk Grove, IL) was sealed with dental cement around the rim of the ear-canal such that the probe's tip was within ∼1–2 mm from the eardrum. At the end of the experiment, the animals were euthanized via overdose of the anesthetic and, in some cases, a set of measurements was obtained post mortem. All of the procedures were approved by the Institutional Animal Care and Use Committee at the University of Southern California (USC).
B. Optical coherence tomography and vibrometry
We used VOCTV to image the apical turn of the intact cochlea directly through the otic capsule bone and record its motions in response to acoustical stimulation, as described in detail elsewhere (Dewey et al., 2019; Gao et al., 2014). The custom-built VOCTV system consisted of a broadband swept-source laser (Insight Photonic Solutions, Inc., Lafayette, CO, center-wavelength, 1310 nm; bandwidth, 95 nm; sweep rate, 100 kHz), and a high-speed digitizer (AlazarTech ATS9373 card; Alazar Technologies Inc., Pointe-Claire, QC, Canada) connected to a desktop personal computer (PC). The laser light was directed at the cochlea via a mirror adapter attached to the bottom of the dissecting microscope (Stemi-2000, Zeiss, Jena, Germany). First, a cross-sectional image of the apical turns was obtained (B-scan) and voxels with high reflectivity near BM were chosen for vibrometry (Lin et al., 2017). Typically, the angle between the light source and the BM was ∼65°, meaning that the measured vibrations capture mostly BM transverse motion. All of the measurements were performed in an electrically shielded sound-attenuating booth with the animal placed on the vibration isolation table.
C. Stimulus design
All of the acoustical stimuli were designed digitally (100-kHz sampling rate) and delivered through one of the sound sources of the ER10X probe. The stimuli were calibrated in situ to produce a desired amplitude and phase at the probe microphone (i.e., all of the stimuli were corrected for the complex speaker's frequency response in situ). Stimuli consisted of pure tones, clicks, and exponential frequency sweeps, presented at varying levels [SPL or peak-equivalent sound pressure level (peSPL) for clicks].
Cochlear responses to tonal stimuli (100-ms duration, plus 2-ms, cosine-shaped onset/offset ramps) were averaged 20 times, and only the response to the steady-state portion of the stimulus was analyzed (i.e., the ramps were discarded). The tone frequency was varied from 1 to 15 kHz in 0.5 kHz steps, and the tone level was varied from 10 to 100 dB SPL in 10 dB steps. The waveforms were converted to the frequency domain via fast Fourier transform (FFT), and vibratory responses were evaluated at the frequency of the tone stimulus and its integer multiples (harmonic responses).
The acoustic clicks were designed in the frequency domain to produce a flat spectrum over ∼7–15 kHz (the 3-dB bandwidth), using a third order recursive exponential filter, and converted to the time domain via an inverse FFT (Charaziak and Shera, 2021; Shera and Zweig, 1993). The resulting click stimulus waveform (∼120 μs in duration) was presented with a 2-ms delay (re the beginning of the 10-ms buffer), but the time origin for all of the analyses was placed at the centroid of the click stimulus. The responses were averaged 1000 times. The click level was varied from 30 to 100 dB peSPL in 5 dB steps.
The sweeps were designed such that the instantaneous frequency changed from either 0.5 to 40 kHz (upward sweep; r > 0) or from 40 to 0.5 kHz (downward sweep, r < 0) as described in Sec. I A. An onset/offset tonal ramp (2-ms duration) with a frequency corresponding to the start/end frequency was added to minimize any transients produced by the acoustical system. The stimulus waveforms were zero-padded (5-ms) at the beginning and end of the buffer. The ramps and zero-paddings were discarded prior to the analyses. In addition to sweep direction, we also varied the sweep duration (Table I). The shorter the duration of the sweep stimulus, the faster the rate of the frequency change (oct/ms). The sweep level was varied from 40 to 80 dB SPL in 20 dB steps, apart from the slow upward sweep stimulus condition in which additional levels (10–100 dB SPL in 10 dB steps) were collected. The sweep responses were averaged ∼400–1000 times to improve the signal-to-noise ratio (SNR).4 The average cochlear response was then convolved with the inverse filter. Next, the first and higher order impulse responses were separated using time windows with time zero set at the appropriate for each mth impulse response [Eq. (7)]. Note that the sweep duration must be carefully picked so that is longer than the duration of the impulse response.
TABLE I.
Swept-sine stimulus conditions.
| Duration (ms) | Rate (oct/ms) | |||
|---|---|---|---|---|
| Notation | Upward | Downward | Upward | Downward |
| T10 (fast) | 8.78 | 9.98 | 0.720 | −0.634 |
| T20 | 17.54 | 20.06 | 0.360 | −0.315 |
| T40 | 43.84 | 40.00 | 0.144 | −0.158 |
| T75 (mid) | 78.88 | 75.06 | 0.080 | −0.084 |
| T150 | 149.00 | 149.98 | 0.042 | −0.042 |
| T300 (slow) | 297.98 | 299.96 | 0.021 | −0.021 |
While the swept-sine analysis results in a cochlear response already normalized to the stimulus pressure (nm/Pa), the tonal and click responses were normalized to the corresponding stimulus pressure as measured in the ear-canal. We used measurements obtained at high intensities (to avoid contamination from OAEs) and scaled them down to normalize the displacement data (Kalluri and Shera, 2007).
In all cases, the noisiest recordings (typically less than 10%) were identified post hoc (trials with noise floors exceeding average noise levels by two or more standard deviations) and excluded from the final average. The noise was estimated from a waveform corresponding to a difference between averages (V) stored in two separate buffers [A and B; i.e., noise = (VA – VB)/2].
To monitor the stability of the preparation, we repeatedly measured the response to an upward 40-dB SPL slow sweep.
1. Frequency-domain analyses
The click- and swept-sine-derived impulse responses were time-windowed using a recursive exponential window with cutoffs of −0.5 and 1.5 ms re zero time and converted to the frequency domain via FFT. Choosing a relatively short time window improved the SNR at low SPL, and it did not affect the spectral characteristics of the response as compared to data analyzed using a longer widow (−1 to 4 ms). Noisy points were identified using a Rayleigh test. First, the recording blocks were arranged into ten subaverages and for each spectral component, a Rayleigh test (p < 0.001) of its ten phase values determined whether the point was accepted (Versteegh and van der Heijden, 2012). We found that this approach proved more successful in rejecting noisy points than applying 10 dB SNR criterion. Finally, when extracting the best frequency (BF) of any transfer function, the minimal peak prominence was required to be at least 6 dB. The characteristic frequency (CF) was estimated as the BF of the transfer functions measured in response to either 40-dB SPL tones, 40-dB SPL sweeps, or 55-dB peSPL clicks. The ratio of the CFs extracted from sweep vs tone data and sweep vs click data were 0.995 (±0.021) and 1.02 (±0.01), respectively. We used the sweep CF as reference for all of the calculations and interpolated the data when necessary.
2. Time-domain analyses
For the time-domain analysis, the synthetic impulse responses were extracted with a longer window (−1 to 4 ms) and bandpass filtered with a Kaiser window (−2 to 0.5 oct re CF). The envelope and instantaneous frequency of the impulse response were extracted using the Hilbert transform. If, at a given time instance, the envelope of the signal was below mean +1.96SD (standard deviation) of the noise envelope, the data point was rejected from the analyses.
3. Group averages
When calculating means across animals, only points meeting our noise criteria were included. If data were sampled with different resolution (e.g., at different frequencies or time points), the curves were resampled for averaging. Furthermore, a mean value was calculated only if at least 75% of points were present across the animals (50% for post mortem data). All of the error bars display 95% percentile confidence intervals (CIs) obtained via bootstrapping [blocked by the animal identification (ID)].
D. Equivalent click, sweep, and tone SPLs
To critically compare the sweep and click BM impulse responses, one must assure that they were obtained at equivalent stimulus levels. We employ an empirical approach to find click and sweep levels that results in equivalent gain at the CF (Charaziak and Shera, 2021; Kalluri and Shera, 2007; Versteegh and van der Heijden, 2012) across a wide range of stimuli (10–100 dB SPL for sweeps and 30–100 dB peSPL for clicks) in 14 animals. The average difference between click and sweep levels, ΔL= (Lc – Ls), required to match the two input-output functions at CF was equal to 15.7 dB (±0.9 dB; ±95% CI). Thus, we use ΔL of 15 dB for comparing BM transfer functions obtained with sweeps and clicks.
While sweep and tone stimuli were calibrated to produce the same SPL, we calculated ΔL (LT – LS) to test the equivalency of the calibration. The average ΔL was equal to −1.9 (±0.73) dB, indicating that, on average, the tonal stimulus resulted in a slightly lower gain at the CF as compared to the sweep stimuli presented at the same SPL. This change likely results from a small decrease in cochlear sensitivity with time (tone data were typically collected after the sweep data). Indeed, the test-rest measurements of BM responses to 40-dB SPL slow sweeps indicate a small decrease in gain at CF by 1.5 (±1.03) dB through a recording session.
III. RESULTS AND DISCUSSION
First, we present a few individual examples of transfer functions and impulse responses measured across different stimulus types (Sec. III A). These examples are representative of the group data summarized in Sec. III B.
A. Individual data
1. Effect of stimulus type on the BM response
Because stimulation with pure tones is considered to be the gold standard in the field, first, we compare BM transfer functions measured with pure tones and slow upward sweeps (i.e., an infinitely slow sweep converges to a tone). As exemplified in Figs. 2(A) and 2(B), BM transfer functions measured with discrete tones (circles) and slow sweeps (lines) are nearly equivalent: magnitude and phase differences are less than ∼2 dB and ∼0.02 cycles, respectively. Note, that at high frequencies (>CF), the frequency resolution of the pure tone data was often insufficient to avoid phase unwrapping ambiguities [see the upward slope of the phase in Figs. 2(B)].
FIG. 2.

(Color online) BM transfer functions (displacement re ear-canal pressure) obtained before death (colored) and after death (black) at varying sound levels (see the legend). (A) and (B) compare data from a single animal recorded in response to tones (circles) and slow upward sweeps (solid lines). (C)–(E) present transfer functions and impulse responses for another animal, measured with clicks (circles and dotted lines) and fast upward sweeps (solid lines). The scale bar at the top of (E) is rescaled by 1/3 (-10 dB) for each subpanel starting from top to bottom to reflect on the compressive growth of gain with stimulus intensity. The solid vertical lines in (E) show the intensity invariance of the peak position of the impulse response. The phase curves for each level in (B) and (D) are separated by 1 cycle for clarity.
Acoustic clicks, on the other hand, can be viewed as infinitely fast sweeps. Thus, we first compare the BM transfer functions measured with clicks to responses obtained with fast upward sweeps. [For clarity, in Fig. 2 (middle and right), we only include responses to upward sweeps, and we save the discussion of the effect of sweep direction on the impulse response characteristics until Sec. III A 2.] The spectral characteristics of the BM response to sweeps and clicks are very similar within an animal despite inevitable difficulties in equalizing click and sweep stimuli levels (Sec. II D). In this particular case, however, we did see a small shift (∼3 dB) between click- and sweep-derived transfer functions post mortem [Fig. 2(C), black], which was perhaps due to a small shift in a measurement location. In the time domain, the click- and sweep-derived BM impulse responses are also well matched [Fig. 2(E)]. In all cases, the impulse responses display well documented characteristics: intensity invariance of zero crossings [see vertical lines in Fig. 2(E)], frequency glides (i.e., an increase in instantaneous frequency with time), and shift in the envelope maxima toward earlier times with increasing intensity (i.e., a decrease in the group delay).
2. Effect of the sweep parameters on the BM response
Figure 3 shows a set of BM transfer functions measured in response to either upward [Figs. 3(A) and 3(C)] or downward sweeps [Figs. 3(B) and 3(D)] of varying sweep rates (fast vs slow; dotted vs solid) and levels (color-encoded; see the legend). The spectral characteristics of the transfer functions vary very little with sweep rate or direction at any level. If anything, the only noticeable differences are consistently detected on the high frequency slope (≫CF), where the position of the spectral notch (and the corresponding phase shift) could be misaligned when measured with fast sweeps of varying directions, particularly so, at high stimulus levels (light green curves in Fig. 3). Cochlear wave propagation beyond the BF place is poorly understood, and the reason for such misalignment is not clear.
FIG. 3.
(Color online) The effect of sweep rate (slow vs fast; see Table I) on the BM transfer function. In (A) and (B), data obtained with upward sweeps are shown, whereas in (C) and (D), downward sweeps were used. Alive data (color) are for the same animal (ID0820) of Figs. 2(C) and 2(D), whereas post mortem (PM) data (black and gray) are for another representative specimen (no complete PM data set was available for ID0820).
Similarly, the corresponding sweep time-domain impulse responses are nearly indistinguishable in their characteristics especially at the early times (i.e., the “primary” response; Fig. 4). However, subtle differences arise when considering the small ringing portion of the waveform, particularly, for mid and high stimulus levels (green colors; see arrows). For upward sweeps, speeding up the sweep rate results in a slight increase in the amplitude of the ringing portion (e.g., at time ∼7 CF periods) as well as a phase lag as compared to upward slow sweep (observed as a shift in the waveform zero crossings toward later times). An opposite pattern is observed for the downward sweeps (i.e., decreased amplitude and phase lead). As a result, when comparing upward and downward fast sweep impulse responses, the difference is most pronounced [see the arrows in Fig. 4(A)]. The sweep direction does not influence the “ringing” portion of the impulse response for slow sweeps [Fig. 4(B)]. If the ringing portion of the impulse response represents an OAE waveform (Shera and Cooper, 2013), when using fast sweeps (i.e., close to the natural cochlear dispersion rate; see Sec. III B 2), upward sweeps may be more desirable to elicit stronger emissions because they produce stronger ringing in the BM response than downward sweeps.
FIG. 4.
(Color online) Time representations of the data in Fig. 3. Note that the plot emphasizes the differences in the sweep direction (up vs down and solid desaturated vs dashed saturated lines) as opposed to the sweep rate (Fig. 3). The arrows point out the most striking differences between the impulse responses. All of the time waveforms were normalized to their own envelope peak.
3. Harmonic responses
The deconvolution employed in the swept-sine method conveniently separates the first and higher order impulse responses in time (Fig. 1). Figure 5(A) demonstrates good agreement between the magnitudes of first through third order transfer functions measured with either pure tones (circles) or extracted from the swept-tone responses (solid). Typically, the two methods result in matching phase curves [Fig. 5(B)], however, insufficient frequency resolution of tone data creates phase unwrapping ambiguities (not shown). Overall, the second and third harmonic transfer functions display previously reported features (i.e., magnitude peaks at ∼CF/2 and CF; quarter cycle phase shifts) and are not discussed here further (see, e.g., Cooper, 1998; Dewey et al., 2021).
FIG. 5.
(Color online) First through third order impulse responses measured with 80-dB SPL upward sweeps and tones. (A) displays the first order transfer function (purple) as well as second and third harmonic distortions measured with slow sweeps (solid) and tones (circles). The corresponding unwrapped phases are shown in (B) for the sweep data that, unlike the tone transfer functions, had sufficient frequency resolution for unambiguous phase unwrapping. In (C)–(E), the corresponding time waveforms are shown when measured in response to sweeps of varying rates (normalized to own peak envelope).
When the sweep stimulus is “synchronized” [Eqs. (2) and (3)], the higher order impulse responses are independent of the sweep rate [Fig. 5(C)] as expected (Novak et al., 2015). However, if a non-synchronized sweep is used, the higher order impulse response phase must be corrected for because it becomes dependent on the sweep rate (not shown).
B. Group data
1. Frequency properties of the BM transfer function
Figure 6 shows average BM transfer function magnitudes (upper) and phase-gradient group delays (bottom) measured across ranges of stimulus intensities for tones, clicks, and sweeps. When compared at the same (or equivalent in the case of the click) levels, the different temporal properties of the stimulus (click vs tone vs sweeps of varying rates and directions) do not have significant effect on BM transfer function magnitudes and group delays. While the error bars are not shown for sake of clarity, the 95% CIs are typically within ±0.5–2 dB and ±0.05–0.2 periods for the magnitude and delay, respectively, at probe frequencies near and below CF. At high frequencies (>CF), the 95% CIs are wider as the transfer function magnitudes slope abruptly approaches the noise floor.
FIG. 6.

(Color online) Average transfer function magnitudes (upper) and phase-gradient group delays (bottom) for ten animals, measured with clicks (closed circles), discrete tones (open circles) and sweeps (lines) before death (colors) and after death (grayscale; see the legends on the upper panels) for varying intensities [listed in (A); dB SPL/dB peSPL]. The group delays (in stimulus periods) for each probe level were offset by two periods for clarity. The reference zero line for each set of curves is plotted as a dotted horizontal line. The 95% CI error bars are omitted for better visibility.
Figure 7 summarizes properties of BM transfer functions measured across all of the stimulus conditions. In most cases, the duration/type of the stimulus (x-axis) or direction of the sweep (solid vs dashed) do not have significant effects on the transfer function parameters such as BF, gain at BF, phase-gradient group delay at BF, and sharpness of tuning (QERB; see the caption for Fig. 7). One exception is parameters extracted from 75-dB peSPL click responses [Fig. 7(A), light green circle], where the BF tends to be higher in frequency as compared to the BFs extracted from sweep (lines) or tone (squares) transfer functions. As other parameters were extracted with reference to the BF, the click-derived transfer functions trend toward having longer delays [Fig. 7(C)] and sharper tuning [Fig. 7(D)] at highest intensities.
FIG. 7.

(Color online) The average (±95 CI) properties of BM transfer functions measured across a full range of stimulus parameters: duration (x-axis; see Table I for the specific sweep durations/rates), intensity (colors), and sweep direction (solid vs dashed) in ten animals. Post mortem data appear in black. The gain and phase-gradient group delay were extracted at the BF of each individual transfer function. Sharpness of tuning is expressed as QERB, which is defined as the BF divided by the equivalent rectangular bandwidth (ERB) of the BM response.
2. Temporal properties of the BM impulse responses
Figure 8 shows the averaged impulse response envelope gain (top) and normalized instantaneous frequency (bottom) extracted with the Hilbert transform. The error bars are omitted for clarity, and only significant differences are marked. In the left column, click impulse responses are compared to fast upward and downward sweep data. Middle and right columns compare the effects of sweep direction on the impulse response characteristics for fast and slow sweeps, respectively.
FIG. 8.

(Color online) The average envelope and instantaneous frequency of impulse responses measured with click and sweep stimuli before death (colors) and after death (grayscale; see legend on upper panels). The average instantaneous frequency (normalized to CF) for each probe level was offset for clarity. The reference lines (i.e., instantaneous frequency equal to CF) for each set of curves are plotted in dotted horizontal lines. The 95% CI error bars are omitted for clarity; significant differences are marked with asterisks.
Over the first ∼4–6 cycles of the impulse response, the gain is equivalent for the different stimulus types. The first cycle of the envelope overlaps across stimulus levels (when measurable), indicating a linear growth of the cochlear response [e.g., Figs. 8(A), 8(C), and 8(E)]. The peak of the envelope shifts toward later times with decreasing stimulus level, which is consistent with an increase in the phase-gradient group delay at the CF. At later times (>6 CF periods), a significant effect of the stimulus type on the impulse response envelope emerges: When either click or upward fast sweeps are used, the impulse response rings over a longer duration [Figs. 8(A) and 8(C)] than when using the other stimuli. In contrast, downward fast sweeps, or slow sweeps of any direction, result in statistically indistinguishable impulse responses [e.g., Fig. 8(E)]. The 95% CIs for the average envelope gain are of similar magnitude across stimuli when compared at equivalent stimulus levels (the lower the stimulus level, the wider the 95% CI). When expressed as a percent of the mean response, the 95% CIs are typically within ±10%–20% around the mean across all of the conditions. Thus, any significant differences reported here do not appear to be the result of differences in variability of the responses across the stimulus types.
The instantaneous frequency of the impulse responses in alive animals displays the characteristic upward frequency glide, increasing from ∼0.25 CF to ∼CF over the first ∼2–3 CF periods of the response [Figs. 8(B), 8(D), and 8(F), colors; see also Recio et al., 1998; Shera, 2001a]. There are no significant effects of stimulus type or level on the instantaneous frequency, which is consistent with the view that cochlear dispersion is mainly determined by passive acoustic properties (see Altoè and Shera, 2020; Shera, 2001b). While post mortem, the glide persists over the first ∼1–2 cycles, and at later times, the instantaneous frequency plateaus near 0.5–0.6 CF, which is consistent with a large downward shift in BF in the magnitude spectra [e.g., Fig. 6(A)]. Typically, the 95% CIs are between ±0.05 and ±0.01, around the average instantaneous frequency/CF across all of the conditions.
Figure 9 summarizes the characteristics of BM impulse responses measured across all of the stimulus conditions (except for the pure tones). The major stimulus effect is on the ringing portion of the impulse response, parametrized here as a ratio of the median gain of the envelope at times spanning 6.5–7.5 periods to gain at the peak (“tail-to-peak” ratio). The higher that the ratio is, the stronger the ringing will be. Increasing the rate of the upward 60- and 80-dB SPL sweeps increases the tail-to-peak ratio [Fig. 9(D), solid green and lime] to a degree observed for click impulse responses. The effect of the sweep rate on the tail-to-peak ratio is not significant for downward sweeps (note that the data are also noisier given that the ringing portion of the response was of a lower magnitude).
FIG. 9.

(Color online) The average (±95 CI) properties of BM impulse responses measured with clicks (circles) and sweeps (fast to slow, x-axis; see Table I) for varying intensities (colors) and sweep directions (solid vs dashed) in ten animals. The gain, delay, and instantaneous frequency were all extracted at the peak of the impulse response envelope. The tail-to-peak ratio represents the ratio of the median envelope gain over the 6.5–7.5 CF periods time frame to the gain at the peak of the envelope.
Other parameters characterizing the impulse response (gain at the peak, delay at the peak, and instantaneous frequency at the peak) do not vary significantly with either direction or stimulus duration with one exception: The peak gain for the highest level of clicks tends to be lower as compared to the sweep data [Fig. 9(A), lime]. We do not consider that of importance given that the click and sweep levels were matched empirically in post-processing. In all of the cases, the parameters change with the stimulus intensity: With increasing intensity, the gain and delay at the peak increased [Figs. 9(A) and 9(B); lime to purple]. Similarly, the instantaneous frequency at the peak decreases relative to the CF with increasing intensity [Fig. 9(C)], which is consistent with the downward shift of BF with increasing level (Fig. 7).
To estimate cochlear dispersion, we calculated the rate of change of the instantaneous frequency over 1.5–2.5 CF periods time window at high stimulus intensities, where the data were the cleanest. There is a considerable amount of variability across animals with the average dispersion rates across stimulus conditions ranging from 0.16 to 0.21 oct/cycle (with 95% CIs range rarely exceeding ±0.015 oct/cycle), corresponding to sweep rates of 1.8–1.3 (typically less than ±0.2) oct/ms in alive preparations. The highest rates are observed for the click data, and the lowest rates are observed for downward sweep stimuli. Note that our fastest sweep stimulus (0.72 oct/ms) is still slower by roughly a factor of 2 as compared to cochlear dispersion rates. Following death, the average dispersion rate decreases significantly to ∼0.07–0.08 (±0.01) oct/cycle and 0.6–0.7 (±0.1) oct/ms. This is not surprising given the considerable downward shift in the BF post mortem (i.e., the instantaneous frequency of post mortem impulse responses does not reach the CF).
IV. CONCLUSIONS
We have shown that either sweeps or more traditional stimuli, such as pure tones and clicks, produce nearly identical cochlear impulse responses when compared at most equivalent conditions in terms of stimulus level and sweep-rate. In the frequency domain, the transfer function parameters varied little across the stimulus space when evaluated at equivalent SPLs. However, in the time domain, the impulse responses acquired with moderate- to high-level clicks and fast upward sweep rang significantly longer than those acquired with downward and/or slow sweeps. The amplitudes of the ringing portions of the impulse responses at mid to high stimulus levels were relatively small compared to the “main” response (i.e., ∼10%–30% of peak gain), which is why the effects on transfer function magnitudes were negligible in the frequency domain. The ringing portion of the impulse response has been hypothesized to correspond to internal reflection of traveling wave that, in part, escapes the cochlea as an OAE (Shera and Cooper, 2013). In other species (e.g., chinchillas), the amplitude of the ringing portion can reach 50% of the peak amplitude (Recio et al., 1998; Shera and Cooper, 2013), which is consistent with the idea that these animals are better “emitters” as compared to mice that are known for having relatively small reflection type emissions (Siegel et al., 2011). Finally, we did not observe clear ringing at low stimulus levels. The interference between the ringing and main responses causes wiggles in the cochlear transfer function frequency response that may affect the tonotopic map of the cochlea (Shera, 2015). We observed that for sweep rates approaching the rate of natural cochlear dispersion, the ringing portion of the impulse response was reduced for downward but not for upward sweeps, for which impulse responses looked more like responses measured with clicks. The direction of cochlear dispersion at the measurement location is upward (i.e., instantaneous frequency changes from below CF to CF). Thus, upward sweeps tend to work with cochlear dispersion while downward sweeps are against it in our preparation. Recio and Rhode (2000) demonstrated that, indeed, the BM response to Schroeder-phase complexes, which frequency changes linearly over time, is “peakier” for downward sweep stimuli (i.e., more compressed in time) as compared to upward sweep stimuli, where responses were spread over a longer duration. This time-compression of the cochlea response to fast downward sweeps may create the potential for mutual suppression due to temporal overlap and inhibit the generation of the ringing response (e.g., Charaziak et al., 2020). Although it was not the main goal of the study, our data support the view that the differences between cochlear impulse responses derived from steady vs transient stimuli originate in the dispersive nature of the wave propagation in the cochlea.
From a methodological standpoint, we recommend that when deciding on the specific sweep stimulus parameters, the experimenter shall consider species- and location-dependent properties of cochlear dispersion. Here, we showed that “slow” sweeps (i.e., slow relative to the natural dispersion rate in mice) of either direction produce cochlear transfer functions equivalent to ones measured with pure tones. In contrast, “fast” sweep rates (i.e., approaching the natural rate of dispersion) produce click-like impulse responses but only if the sweep direction agrees with the direction of cochlear dispersion at the measurement location. The direction of cochlear dispersion (as well as its rate) varies with cochlear location, such as more apical locations display downward frequency glides, while upward glides are measured more basally, with a transition region in between that shows no glide at all (Carney et al., 1999; Temchin et al., 2011). To complicate things even further, the location of apical-basal transition varies across species, hence, so does the cochlear delay (Joris et al., 2011; Shera et al., 2010). For instance, the anatomical apex in mice does not appear to have apical-like mechanics, suggesting that from a functional standpoint of view, the entire mouse cochlea acts as “base-like” (i.e., there is effectively no clear apical-basal transition; Cheatham, 2021). Thus, to obtain more click-like vs tone-like cochlear impulse responses using sweeps, one must select the sweep stimulus parameters relative to the natural dispersion characteristics for the given species and cochlear location.
ACKNOWLEDGMENTS
We thank Dr. Christopher Shera and two anonymous reviewers for helpful comments on the manuscript. This work was supported by Grant Nos. K99/R00 DC016906 (K.K.C.) and R21 DC019712 (A.A.) from the National Institutes of Health (NIH) and USC.
Footnotes
Despite these differences and for simplicity, to describe cochlear responses in time and frequency domains, we adapt the terminology used for the linear systems, i.e., impulse response and transfer function, respectively. In reality, these are “synthetic” impulse responses as they depend on the stimulus parameters.
We define the rate, r (oct/ms), as it is typically specified as a “change over time.” In contrast, Farina (2000) expresses rate, L, as a “change over frequency” (ms/oct). Thus, L = r−1.
Novak et al. (2015) further simplifies the generation of synchronized swept-tone stimulus by modifying Eq. (1) such that the synchronization can be achieved for any desired duration of stimulus, eliminating the need for the rounding procedure in Eq. (2).
A desired SNR can be accomplished without averaging by designing a sweep of appropriately long duration (Farina, 2000). The total averaging time (i.e., number of averages multiplied by the sweep duration) is what should define the final SNR. In practice, using many short sweeps instead of a long sweep has an advantage of an easy removal of artifactual/noisy buffers without scarifying as much of the averaging time.
References
- 1. Abdala, C. , Luo, P. , and Shera, C. A. (2015). “ Optimizing swept-tone protocols for recording distortion-product otoacoustic emissions in adults and newborns,” J. Acoust. Soc. Am. 138, 3785–3799. 10.1121/1.4937611 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Altoè, A. , and Shera, C. A. (2020). “ The cochlear ear horn: Geometric origin of tonotopic variations in auditory signal processing,” Sci. Rep. 10, 20528. 10.1038/s41598-020-77042-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Bennett, C. L. , and Özdamar, Ö. (2010). “ Swept-tone transient-evoked otoacoustic emissions,” J. Acoust. Soc. Am. 128, 1833–1844. 10.1121/1.3467769 [DOI] [PubMed] [Google Scholar]
- 4. Carney, L. H. , McDuffy, M. J. , and Shekhter, I. (1999). “ Frequency glides in the impulse responses of auditory-nerve fibers,” J. Acoust. Soc. Am. 105, 2384–2391. 10.1121/1.426843 [DOI] [PubMed] [Google Scholar]
- 5. Carney, L. H. , and Yin, T. C. T. (1988). “ Temporal coding of resonances by low-frequency auditory nerve fibers: Single-fiber responses and a population model,” J. Neurophysiol. 60, 1653–1677. 10.1152/jn.1988.60.5.1653 [DOI] [PubMed] [Google Scholar]
- 6. Charaziak, K. K. , Dong, W. , Altoè, A. , and Shera, C. A. (2020). “ Asymmetry and microstructure of temporal-suppression patterns in basilar-membrane responses to clicks: Relation to tonal suppression and traveling-wave dispersion,” J. Assoc. Res. Otolaryngol. 21, 151–170. 10.1007/s10162-020-00747-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Charaziak, K. K. , and Shera, C. A. (2021). “ Reflection-source emissions evoked with clicks and frequency sweeps: Comparisons across levels,” J. Assoc. Res. Otolaryngol. 22, 641–658. 10.1007/s10162-021-00813-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Cheatham, M. A. (2021). “ Comparing spontaneous and stimulus frequency otoacoustic emissions in mice with tectorial membrane defects,” Hear. Res. 400, 108143. 10.1016/j.heares.2020.108143 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Cooper, N. P. (1998). “ Harmonic distortion on the basilar membrane in the basal turn of the guinea-pig cochlea,” J. Physiol. 509(1), 277–288. 10.1111/j.1469-7793.1998.277bo.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Dewey, J. B. , Altoe, A. , Shera, C. A. , Applegate, B. E. , and Oghalai, J. S. (2021). “ Cochlear outer hair cell electromotility enhances organ of Corti motion on a cycle-by-cycle basis at high frequencies in vivo,” Proc. Natl. Acad. Sci. U.S.A. 118, e2025206118. 10.1073/pnas.2025206118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Dewey, J. B. , Applegate, B. E. , and Oghalai, J. S. (2019). “ Amplification and suppression of traveling waves along the mouse organ of Corti: Evidence for spatial variation in the longitudinal coupling of outer hair cell-generated forces,” J. Neurosci. 39, 1805–1816. 10.1523/JNEUROSCI.2608-18.2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Farina, A. (2000). “ Simultaneous measurement of impulse response and distortion with a swept-sine technique,” in 108th Audio Engineering Society Convention, Paris. [Google Scholar]
- 13. Farina, A. (2007). “ Advancements in impulse response measurements by sine sweeps,” in 112th Audio Engineering Society Convention, New York. [Google Scholar]
- 14. Gao, S. S. , Wang, R. , Raphael, P. D. , Moayedi, Y. , Groves, A. K. , Zuo, J. , Applegate, B. E. , and Oghalai, J. S. (2014). “ Vibration of the organ of Corti within the cochlear apex in mice,” J. Neurophys. 112, 1192–1204. 10.1152/jn.00306.2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Joris, P. X. , Bergevin, C. , Kalluri, R. , Mc Laughlin, M. , Michelet, P. , van der Heijden, M. , and Shera, C. A. (2011). “ Frequency selectivity in Old-World monkeys corroborates sharp cochlear tuning in humans,” Proc. Natl. Acad. Sci. U.S.A. 108, 17516–17520. 10.1073/pnas.1105867108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Kalluri, R. , and Shera, C. A. (2007). “ Near equivalence of human click-evoked and stimulus-frequency otoacoustic emissions,” J. Acoust. Soc. Am. 121, 2097–2110. 10.1121/1.2435981 [DOI] [PubMed] [Google Scholar]
- 17. Keefe, D. H. , Feeney, M. P. , Hunter, L. L. , and Fitzpatrick, D. F. (2016). “ Comparisons of transient evoked otoacoustic emissions using chirp and click stimuli,” J. Acoust. Soc. Am. 140, 1949–1973. 10.1121/1.4962532 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Lin, N. C. , Hendon, C. P. , and Olson, E. S. (2017). “ Signal competition in optical coherence tomography and its relevance for cochlear vibrometry,” J. Acoust. Soc. Am. 141, 395–405. 10.1121/1.4973867 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Long, G. R. , Talmadge, C. L. , and Lee, J. (2008). “ Measuring distortion product otoacoustic emissions using continuously sweeping primaries,” J. Acoust. Soc. Am. 124, 1613–1626. 10.1121/1.2949505 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Novak, A. , Lotton, P. , and Simon, L. (2015). “ Synchronized swept-sine: Theory, application, and implementation,” J. Audio Eng. Soc. 63, 786–798. 10.17743/jaes.2015.0071 [DOI] [Google Scholar]
- 21. Recio, A. , and Rhode, W. (2000). “ Basilar membrane responses to broadband stimuli,” J. Acoust. Soc. Am. 108, 2281–2298. 10.1121/1.1318898 [DOI] [PubMed] [Google Scholar]
- 22. Recio, A. , Rich, N. C. , Narayan, S. S. , and Ruggero, M. A. (1998). “ Basilar-membrane responses to clicks at the base of the chinchilla cochlea,” J. Acoust. Soc. Am. 103, 1972–1989. 10.1121/1.421377 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Recio-Spinoso, A. , Narayan, S. S. , and Ruggero, M. A. (2009). “ Basilar membrane responses to noise at a basal site of the chinchilla cochlea: Quasi-linear filtering,” J. Assoc. Res. Otolaryngol. 10, 471–484. 10.1007/s10162-009-0172-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Rhode, W. S. , and Recio, A. (2001). “ Basilar-membrane response to multicomponent stimuli in chinchilla,” J. Acoust. Soc. Am. 110, 981–994. 10.1121/1.1377050 [DOI] [PubMed] [Google Scholar]
- 25. Shera, C. A. (2001a). “ Intensity-invariance of fine time structure in basilar-membrane click responses: Implications for cochlear mechanics,” J. Acoust. Soc. Am. 110, 332–348. 10.1121/1.1378349 [DOI] [PubMed] [Google Scholar]
- 26. Shera, C. A. (2001b). “ Frequency glides in click responses of the basilar membrane and auditory nerve: Their scaling behavior and origin in traveling-wave dispersion,” J. Acoust. Soc. Am. 109, 2023–2034. 10.1121/1.1366372 [DOI] [PubMed] [Google Scholar]
- 27. Shera, C. A. (2015). “ The spiral staircase: Tonotopic microstructure and cochlear tuning,” J. Neurosci. 35, 4683–4690. 10.1523/JNEUROSCI.4788-14.2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Shera, C. A. , and Cooper, N. P. (2013). “ Basilar-membrane interference patterns from multiple internal reflection of cochlear traveling waves,” J. Acoust. Soc. Am. 133, 2224–2239. 10.1121/1.4792129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Shera, C. A. , Guinan, J. J. , and Oxenham, A. J. (2010). “ Otoacoustic estimation of cochlear tuning: Validation in the chinchilla,” J. Assoc. Res. Otolaryngol.. 11, 343–365. 10.1007/s10162-010-0217-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Shera, C. A. , and Zweig, G. (1993). “ Noninvasive measurement of the cochlear traveling-wave ratio,” J. Acoust. Soc. Am. 93, 3333–3352. 10.1121/1.405717 [DOI] [PubMed] [Google Scholar]
- 31. Siegel, J. H. , Charaziak, K. , and Cheatham, M. A. (2011). “ Transient- and tone-evoked otoacoustic emissions in three species,” AIP Conf. Proc. 1403, 307–314. 10.1063/1.3658103 [DOI] [Google Scholar]
- 32. Summers, V. , de Boer, E. , and Nuttall, A. L. (2003). “ Basilar-membrane responses to multicomponent (Schroeder-phase) signals: Understanding intensity effects,” J. Acoust. Soc. Am. 114, 294–306. 10.1121/1.1580813 [DOI] [PubMed] [Google Scholar]
- 33. Temchin, A. N. , Recio-Spinoso, A. , and Ruggero, M. A. (2011). “ Timing of cochlear responses inferred from frequency-threshold tuning curves of auditory-nerve fibers,” Hear. Res. 272, 178–186. 10.1016/j.heares.2010.10.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. van der Heijden, M. , and Joris, P. X. (2003). “ Cochlear phase and amplitude retrieved from the auditory nerve at arbitrary frequencies,” J. Neurosci. 23, 9194–9198. 10.1523/JNEUROSCI.23-27-09194.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Versteegh, C. , and van der Heijden, M. (2012). “ Basilar membrane responses to tones and tone complexes: Nonlinear effects of stimulus intensity,” J. Assoc. Res. Otolaryngol. 13, 785–798. 10.1007/s10162-012-0345-0 [DOI] [PMC free article] [PubMed] [Google Scholar]



