Abstract
The DPOAE response consists of the linear superposition of two components, a nonlinear distortion component generated in the overlap region, and a reflection component generated by roughness in the DP resonant region. Due to approximate scaling symmetry, the DPOAE distortion component has approximately constant phase. As the reflection component may be considered as a SFOAE generated by the forward DP traveling wave, it has rapidly rotating phase, relative to that of its source, which is also equal to the phase of the DPOAE distortion component. This different phase behavior permits effective separation of the DPOAE components (unmixing), using time-domain or time-frequency domain filtering. Departures from scaling symmetry imply fluctuations around zero delay of the distortion component, which may seriously jeopardize the accuracy of these filtering techniques. The differential phase-gradient delay of the reflection component obeys causality requirements, i.e., the delay is positive only, and the fine-structure oscillations of amplitude and phase are correlated to each other, as happens for TEOAEs and SFOAEs relative to their stimulus phase. Performing the inverse Fourier (or wavelet) transform of a modified DPOAE complex spectrum, in which a constant phase function is substituted for the measured one, the time (or time-frequency) distribution shows a peak at (exactly) zero delay and long-latency specular symmetric components, with a modified (positive and negative) delay, which is that relative to that of the distortion component in the original response. Component separation, applied to this symmetrized distribution, becomes insensitive to systematic errors associated with violation of the scaling symmetry in specific frequency ranges.
INTRODUCTION
Time-frequency (t-f) filtering [1] provides an effective method for separating the distortion and reflection components of fDP=2f1-f2 distortion product otoacoustic emissions (DPOAEs), based on their different phase-frequency relation. The method is almost equivalent to applying filtering in the time domain to the inverse Fourier transform (IFT) of the DPOAE complex spectrum, using different time windows in different frequency ranges [2], with the advantage of performing the filtering operation in a single step and of easily adapting the filtering t-f domains to the actual t-f distribution of the subject. Therefore, the results of this study will apply also to the IFT filtering technique.
The theoretical assumptions which both filtering techniques are based on are that the distortion DPOAE component arises from a wave-fixed mechanism, with a source localized in the f2 tonotopic region, whereas reflection components mostly arise from place-fixed reflection of the forward DP wave in a region close to the fDP tonotopic region. As a consequence, in a scale invariant cochlea, the distortion component would have constant phase, whereas rapidly rotating phase is predicted for the reflection component, with a phase-gradient delay scaling as the reciprocal of frequency. This phenomenology is actually observed in the experimental data, with small but significant departures from the theoretical prediction. Scaling symmetry violation results indeed in deviations from the constant phase predicted for the distortion component, and from the exact inverse proportionality relation between delay and frequency of the reflection components [2]. These deviations affect the performances of the filter, particularly if the same filtering regions are chosen for all subjects. As the distortion component is typically much stronger, at commonly used stimulus levels and primary frequency ratios, than the reflection component, the main risk is to include into the reflection component filtering region a significant contribution associated with distortion components of non-zero delay. In this study, we will suggest an analysis method that could decrease this risk, which would work for the experimental DPOAE responses evoked by medium-to-high stimulus levels, for which the distortion component is the dominant one over the whole frequency range.
METHODS
Experimental DPOAE data from young healthy subjects have been used, which had already been analysed, e.g., in [3], where all the acquisition details can be found. High frequency-resolution (20 Hz) complex spectra were obtained using slow (800Hz/s) linear chirp stimuli in the 1–4 kHz range. The t-f filtering method is illustrated in [1]. Here we just recall that, starting from an OAE complex spectrum, effective separation of the distortion and reflection components may be achieved by selecting specific curved-boundaries regions in the time-frequency domain. The continuous wavelet transform provides an effective tool to represent the original data (recorded either in the time or frequency domain) in the joint t-f domain, where filtering is most conveniently performed, and to go back to the frequency domain to reconstruct the filtered spectra of each component. In this study, we do not focus on the filtering technique, but on a preliminary treatment of the DPOAE complex spectra that we propose to overcome some limitations of the filtering procedure.
How Does the DPOAE T-F Distribution Depend on the Phase Function?
The fine-structure observed in the DPOAE complex spectra has been interpreted [4] as the result of interference between (at least) two components with different phase behavior. In the DPOAE case, one generally assumes to observe the vector sum of a stronger distortion component with almost constant phase and a weaker reflection component with rapidly rotating phase. Although one usually focuses on the amplitude spectrum, the DPOAE phase function also shows the weakly oscillatory pattern (around an almost constant value) that is generally consistent with this interpretation. The t-f representation of the DPOAE response obtained applying the wavelet transform to the complex spectrum, consistently shows an almost zero-delay component and a positive-delay component with delay approximately proportional to the reciprocal of frequency.
Even without having measured the DPOAE phase, the amplitude modulation visible in the DPOAE fine structure could have been interpreted as due to interference between a zero-delay component and a causal time-shifted component of positive delay. In this interpretation, the frequency “quasi-period” of the spectral oscillations would encode the delay-frequency function of the causal delayed component. Note that the same spectral fine structure arises also when (at least two) causal multiple-delay components are present, without any zero-delay component (as happens for TEOAEs and SFOAEs). The experimental phase-frequency function further confirms this interpretation, being consistent with the assumed vector sum of two differently-delayed complex components.
In causal linear systems, the real and imaginary part of the complex response are not independent from each other, as well as the amplitude and phase functions, A(f) and f(f). Note that we are not in the least assuming here that the cochlea, or the system generating the DPOAE response, is, even approximately, a linear system. What we want to understand is what happens if the original DPOAE complex signal is arbitrarily modified, altering in a specific way the experimental relation between phase and amplitude, whatever it originally might be. The specific alteration that we will consider is equivalent to assuming no components with a causal delay.
Experimental DPOAE spectra typically show a dominant wave-fixed component with delay close to zero. We can heuristically predict that if we artificially cancel the fingerprints of causality from the spectral phase function, without correspondingly removing the amplitude oscillations, the wavelet transform will interpret the spectral amplitude oscillations, which are too fast to be followed by its poor frequency resolution, as due to interference between the main exactly-zero-delayed component and two symmetrically rotating phase components of opposite delay and equal amplitude. In the SFOAE or TEOAE case, the same procedure would align along the zero-delay axis the main reflection component, with symmetrical multiple reflection components at delays corresponding to the difference between the delay of the main component (generally, the first reflection) and the others (multiple intra-cochlear reflections).
RESULTS AND DISCUSSION
The method used for DPOAE components unmixing in a real experiment is shown in Fig.1 (left), where the time-frequency representation of the full DPOAE response of a healthy young subject is shown. It may be noted that, although the distortion component is rather well separated from the reflection component in the t-f domain, significant deviations from the zero-delay predictions occur, in the low-frequency range and around 3.5 kHz. The positive delay (1–2 ms) observed below 1.5 kHz is a well-known consequence of scaling symmetry breaking in the apical cochlea [2,3 ], but a perhaps more serious problem arises from individual delay fluctuations in the high-frequency region. Indeed, the time width of the “distortion” filtering region (dashed lines) is very narrow at high frequency, so even delay fluctuations of order 1 ms may cause contamination of the weak reflection component by the long-delay tails of the intense distortion component.
FIGURE 1.

Original time-frequency distribution of the DPOAE response of a young healthy ear (left). The snakelike fluctuations of the distortion component around zero delay are clearly visible. By replacing the actual phase response with zero, one gets a symmetrized response, in which the reflection component amplitude is split in two mirrored components, while the systematic deviations from null delay of the distortion component disappear, making unmixing easier. A particularly narrow region, shown by dashed lines, is chosen to unmix the distortion component. This non-optimal choice is useful to enhance the effect of the proposed procedure.
This problem could be faced using adaptive filtering techniques, in which the individual t-f distribution of the DPOAE signal would be used to tailor a specifically shaped filtering region that would optimize component separation by following the snakelike experimental energy distribution in the t-f plane visible in Fig.1 (left). This possibility highlights the power of the t-f filtering approach, because only in the t-f domain the shape of this filtering region can be easily visualized and optimized. Nevertheless, this choice does not seem to be good, because tailoring too tightly the analyst choices on the features of the data always implies the risk of introducing systematic errors, particularly for noisy data, as the situation would not generally be as crystal-clear as it looks in Fig.1.
If we replace the actual experimental phase function with a constant (e.g., f = 0), the time-frequency distribution of the data, as shown in the right panel of Fig.1, is slightly altered, with the distortion component exactly aligned along the zero-delay axis, with no more deviations associated with scaling symmetry breaking, and the reflection component split in two symmetrical regions, with half the energy of the original positive-delay component transferred to a negative-delay “mirror” counterpart. This exercise suggests a possible application to the unmixing of DPOAE components. Indeed, if we perform filtering on the modified t-f representation, we may more easily avoid contamination between the different components, due to increased symmetry of the t-f distribution that we artificially achieved.
Using the same filtering regions for the two t-f distributions of Fig.1, yields the filtered spectra of Fig.2, where one can appreciate that in the high-frequency region, after the proposed pre-processing method (thick lines), the estimated reflection component decreases, while the estimated distortion component correspondingly increases, with respect to the original data (thin lines). This effect is clearly due to having aligned the distortion component “snake” along the zero-delay axis in Fig.1, removing its contamination of the estimated reflection component.
FIGURE 2.

Effect of the proposed pre-processing on the spectra of the unmixed components. The distortion component slightly increases (red thick line) and the reflection component decreases (black thick line) with respect to the original spectra (thin lines) because the distortion component is more effectively “confined” in the filtering region around the zero-delay axis of Fig.1.
Of course, this procedure is altering the phase (and the delay) of both components, but one must consider that the reflection component is like a SFOAE evoked by the distortion component [5], so the fluctuations of the distortion component phase-gradient delay act on that of the reflection component as fluctuations of the phase gradient of a SFOAE stimulus do on that of the SFOAE response, i.e., they are a spurious effect to be removed, or accounted for. Therefore our procedure of rectifying the phase of the distortion component is equivalent to computing the phase difference between response and stimulus, so the phase-gradient delay of the modified reflection component is the “correct” one, if interpreted as that of the “frequency response of the reflection mechanism”, as in the SFOAE and TEOAE cases.
The method breaks down if the almost-zero-delay distortion component is not dominant over the whole spectral range. Indeed, in any case, the dominant component is the one that is aligned by this procedure along the zero-delay axis, with the lower intensity components forming specular symmetrical images above and below it. If the hierarchy among the components is not stable across frequencies, the result may be confusing. In the SFOAE case the distortion component does not exist, therefore, this procedure would align the dominant single-reflection component along the zero-delay axis, with symmetrical bands above and below corresponding to the fainter multiple-reflection components. Even in this case, the procedure could be useful for more effectively unmixing the single-reflection component amplitude spectrum from the multiple reflections and from the spurious zero-delay components associated with incomplete cancellation of the probe stimulus.
CONCLUSIONS
A simple pre-processing procedure of the DPOAE spectra is proposed, meant to optimize the separation algorithms based on time-frequency domain filtering, or, equivalently, on time domain filtering of the IFT’s of overlapping spectral intervals. The method is effective for most DPOAE spectra, as long as the dominant component is the wave-fixed one, associated with nonlinear distortion, over the whole frequency range. This condition is generally well verified, with the exception of DPOAE spectra recorded using very low stimulus levels in subjects with particularly high gain of the cochlear amplifier.
REFERENCES
- 1.Moleti A, Longo F, Sisto R (2012). “Time-frequency domain filtering of evoked otoacoustic emissions,” J. Acoust. Soc. Am 132, 2455–2467. [DOI] [PubMed] [Google Scholar]
- 2.Dhar S, Rogers A, Abdala C (2011). “Breaking away: Violation of distortion emission phase-frequency invariance at low frequencies,” J. Acoust. Soc. Am 129, 3115–3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Moleti A, Pistilli D, Sisto R (2017). “Evidence for apical-basal transition in the delay of the reflection components of otoacoustic emissions,” J. Acoust. Soc. Am 141, 116–126. [DOI] [PubMed] [Google Scholar]
- 4.Shera CA, Guinan JJ Jr., (1999). “Evoked otoacoustic emissions arise by two fundamentally different mechanisms: a taxonomy for mammalian OAEs,” J. Acoust. Soc. Am 105, 782–798. [DOI] [PubMed] [Google Scholar]
- 5.Kalluri R, and Shera CA (2007). “Near equivalence of human click-evoked and stimulus-frequency otoacoustic emissions,” J Acoust Soc Am 121, 2097–110. [DOI] [PubMed] [Google Scholar]
