Abstract
The auditory brainstem response (ABR) is a sound-evoked neural response commonly used to assess auditory function in humans and laboratory animals. ABR thresholds are typically chosen by visual inspection, leaving the procedure susceptible to user bias. We sought to develop an algorithm to automate determination of ABR thresholds to eliminate such biases and to standardize approaches across investigators and laboratories. Two datasets of mouse ABR waveforms obtained from previously published studies of normal ears as well as ears with varying degrees of cochlear-based threshold elevations (Maison et al., 2013; Sergeyenko et al., 2013) were reanalyzed using an algorithm based on normalized cross-covariation of adjacent level presentations. Correlation-coefficient vs. level data for each ABR level series were fit with both a sigmoidal and two-term power function. From these fits, threshold was interpolated at different criterion values of correlation-coefficient ranging from 0 to 0.5. The criterion value of 0.35 was selected by comparing visual thresholds to computed thresholds across all frequencies tested. With such a criterion, the mean algorithm-computed thresholds were comparable to the visual thresholds noted by two independent observers for each data set. The success of the algorithm was also qualitatively assessed by comparing averaged waveforms at the thresholds determined by the two methods, and quantitatively assessed by comparing peak 1 amplitude growth functions expressed as dB re each of the two threshold measures. Application of a cross-covariance analysis to ABR waveforms can emulate visual thresholding decisions made by highly trained observers. Unlike previous applications of similar methodologies using template matching, our algorithm performs only intrinsic comparisons within ABR sets, and therefore is more robust to equipment and investigator differences in assessing waveforms, as evidenced by similar results across the two datasets.
Keywords: Auditory, Hearing, Threshold, Automatic, Algorithm, Correlation
1. Introduction
The auditory brainstem response (ABR) is a sound-evoked neural response commonly used to assess auditory function in humans and laboratory animals. Scalp or needle electrodes record averaged gross potentials to an acoustic transient, as the response is conveyed through the ascending auditory pathway. The acoustic stimuli, usually clicks or short tone bursts, are presented at a series of sound-pressure levels until the evoked response exhibits clearly defined peaks and troughs.
Lesion studies in mice (Henry, 1979a; Henry, 1979b; Henry, 1979c; Land et al., 2016) and cats (Achor et al., 1980a; Achor et al., 1980b; Melcher et al., 1996a; Melcher et al., 1996b; Melcher et al., 1996c) have helped identify the cellular generators underlying each peak. In mice, five positive peaks are generally observed: Wave 1 – generated in the auditory nerve, Wave 2 – dominated by the cochlear nucleus, Wave 3 – with contributions from cochlear nucleus and superior olivary complex, Wave 4 – originating in the vicinity of the periolivary and lateral lemniscal nuclei, and Wave 5 – from the inferior colliculus. While changes in peak amplitudes and latencies can indicate dysfunction in contributing central auditory nuclei (Lewis et al., 2015; Mehraei et al., 2016; Ridley et al., 2018), ABRs are primarily used to determine auditory thresholds when behavioral tests are impractical, as with infants or anesthetized animals.
Despite their utility as a non-invasive hearing assessment, visual determination of ABR thresholds is inherently subjective and thus susceptible to interpretation error and/or observer bias (Vidler et al., 2004). Threshold is commonly defined by visual inspection of “stacked” waveforms, as the lowest sound-pressure level at which consistent peaks or troughs are observed (Figure 1A,D). Our definition of “consistent” includes the criterion that the latency of each peak increases as sound level decreases. However, because ABR potentials are in the microvolt range, the boundary between signal and noise can be ambiguous, especially in cases of cochlear damage, where the range over which response amplitude grows with level is greatly reduced (Gorga et al., 1985; Kirsh et al., 1992; Liberman et al., 2014; Liberman et al., 2015; Madsen et al., 2018; Maison et al., 2013; Rasetshwane et al., 2013; Sergeyenko et al., 2013; Walsh et al., 1992).
Figure 1. Visual thresholds assigned by two observers on two datasets.
Both datasets consisted of ABR runs (level series in 5 dB steps) at 7-10 frequencies from ~5.6- to 45.2-kHz. Dataset 1 (A,B,C) consisted of 1230 runs from 25 mice aged 8-45 wks, some with OC bundle transection at 6 wks (Maison et al., 2013). Dataset 2 (D,E,F) included 416 runs from 46 normal-aging mice aged 6-104 wks (Sergeyenko et al., 2013). A,D: Example ABR runs at 5.6- and 12.1-kHz, respectively, showing stacked waveforms (tone-pip SPLs indicated), positions of waves 1 – 5, and visual thresholds (bolded) for Observers 1 & 2. B,E: Scatterplots comparing thresholds for each run selected by each observer, along with the line indicating perfect agreement (dashed grey), lines indicating ±20 dB SPL (dotted grey) and the best-fit line to the results (solid black). C,F: Histograms of the inter-observer discrepancies for each dataset.
Although the subjectivity of ABR threshold determination has been acknowledged for decades (Vidler et al., 2004; Weber, 1983), attempts to objectify and optimize ABR detection using statistical methods and modeling have failed to achieve widespread adoption in either research or clinical settings. The majority of prior attempts to automate threshold determination have focused on human ABRs, where waveforms can be more difficult to interpret due to variability between subjects. Past statistical attempts have included Pearson product moment correlation (Arnold, 1985; Weber et al., 1980), multiple z-tests between observers (Arnold, 1985), cross-correlation against a template (Cone-Wesson et al., 1997; Davey et al., 2003; Davey et al., 2007; Elberling, 1979) or interleaved responses (Berninger et al., 2014), variance ratios (FSP; Cebulla et al., 2000; Don et al., 1984; Elberling et al., 1984; Sininger, 1993), and signal-to-noise ratios (Bogaerts et al., 2009; Wong et al., 1980). Approaches based on grand-average templates of ‘expected’ response waveforms are difficult, because even small differences in electrode placement between investigators/labs and subject conditions will introduce systematic waveform differences and thus difficulty in establishing a uniform, and uniformly effective, template.
In recent years, methodological approaches have expanded to the use of artificial intelligence and neural networks to model parameter estimation and feature classification (Acır et al., 2006; Delgado et al., 1994; McKearney et al., 2019; Ozdamar et al., 1994; Sanchez et al., 1995; Vannier et al., 2002). While these machine-learning based approaches may yet prove useful in objectifying ABR analyses, extant algorithms focus on the presence or absence of a response (rather than provision of a nominal threshold), are therefore not necessarily backwards compatible to existing datasets, well-established acquisition protocols or different subject pathologies. Additionally, implementation of such algorithms typically requires a significant amount of troubleshooting and optimization which often proves time-consuming and may only be reliable in a narrow range of clinical/laboratory conditions.
In the absence of standardized threshold identification, ABR testing can be prone to an unsatisfactory level of subjectivity. We sought to develop an algorithm that utilizes aspects of signal processing and detection theory to automate the objective determination of ABR thresholds. In addition to streamlining and objectifying the analysis process, we also sought an approach that could be applied to the data-acquisition software, to alert users if threshold had been ‘missed’ at lowest level presentation, or to automatically switch from finer to coarser level steps at suprathreshold levels, thereby increasing speed and accuracy during the data acquisition process. Widespread utilization of this algorithm would also allow better comparisons of ABR data across laboratories.
2. Material and methods
Mouse ABR data from two previously published studies, hereafter referred to as ‘Dataset 1’ (Maison et al., 2013), and ‘Dataset 2’ (Sergeyenko et al., 2013), were reanalyzed using an algorithm based on cross-covariation analysis of adjacent-level averaged waveforms.
ABR testing was performed on anesthetized mice. Acoustic stimuli were generated by digital-to-analog converters (National Instruments) controlled through custom LabVIEW software and delivered through a custom acoustic system consisting of two miniature dynamic drivers (CDMG15008–03A; CUI) and an electret condenser microphone (FG-23329-PO7; Knowles) coupled to a probe tube (for further details: http://www.masseyeandear.org/research/ent/eaton-peabody/epl-engineering-resources/epl-acoustic-system/). Stimuli were 5-ms tone pips (0.5-ms cosine2 rise-fall) delivered in alternating polarity at a repetition rate of 35/s. ABRs were recorded through sub-dermal needle electrodes (Grass) inserted at the vertex and pinna, with a ground reference near the tail (vertex-pinnaD configuration; Shaheen et al., 2015). Responses to as many as 1024 stimuli were amplified 10,000 times, filtered at 0.3 – 3 kHz, sampled at 25 kHz with 16-bit depth and averaged in 5 dB steps from below threshold to a maximum presentation level of 80 dB SPL. Such level series were obtained in each animal at multiple roughly log-spaced frequencies from ~5.6 to 45.2 kHz.
The original visual thresholding strategies differed between the two studies/datasets. In both, ABR thresholds were defined as the lowest SPL at which the wave morphology conformed to a consistent pattern (with peak latencies decreasing systematically with increasing level presentation). For dataset 1, threshold was defined as the lowest level at which any waveform peak (i.e. 1 through 5) was observable. For dataset 2, threshold was defined as the lowest level in which peak 1 was still observable. Thus, all subsequent analyses of these datasets were performed on the appropriately windowed ABR waveform, with dataset 1 windowed to the entire waveform (0 - 8.5 ms) and dataset 2 windowed around peak 1 (0 - 2.5 ms).
Averaged ABR waveforms were imported into MATLAB (R2017b, The MathWorks, Inc., Natick, Massachusetts, USA) and each waveform was processed through a zero-phase dipole butterworth filter between 200-10,000 Hz (Buran et al., 2010). Cross covariation was performed between the windowed waveform at each level, and the level one step (usually 5 dB) higher, using the MATLAB function xcov and the ‘-coeff’ scaling option. Normalized cross-covariance was used to minimize the contribution of mean waveform amplitude, which is expected to increase in a level-dependent manner. Correlation coefficients for each level-pair were extracted from the resultant correlogram (signal lag = 0) and used to generate correlation-coefficient vs. level functions. Because the number of sub-threshold levels tested varied between investigators, and affected the shape of the subsequent correlation-coefficient level function, two curves were fit.
A sigmoidal function using MATLAB Central File Exchange function sigm_fit:
| (1) |
x = lower level of correlated pair (in dB SPL); y = correlation coefficient;
a = minimum value; b = maximum value; c = x value at midpoint; d = slope
A two-term power law model using MATLAB function fit and ‘power2‘ fit type:
| (2) |
x = lower level of correlated pair (in dB SPL); y = correlation coefficient;
a = coefficient; b = exponent; c = constant
Both functions were fit to correlation-coefficient vs. level data for each ABR run, from which threshold could be interpolated or extrapolated at a selected ‘criterion’ value. We chose to use the sigm_fit function rather than generic minimization of a standard sigmoid because sigm_fit conveniently outputs the range (i.e. minimum and maximum correlation coefficients) of the fitted curve that is then passed through the decision tree. We reasoned that this simplification would make the algorithm more accessible to those with minimal coding background, thereby increasing potential adoption of the algorithm. Further details of the final algorithm are found in Results.
3. Results
A total of 1646 ABR runs from 40 mice aged 8-104 weeks with varying cochlear function were used in this study. Each run consists of a level-series of averaged ABR waveforms in response to one of the several tone-pip frequencies tested in each animal. To develop and validate our algorithm, we chose two published studies from which we could access all the waveforms, as well as the thresholds originally determined by the investigators via visual inspection. One study (Maison et al., 2013) included data from mice aged 6 – 50 wks, roughly half of which were surgically de-efferented at 6 wks of age. This dataset was chosen because it was particularly large (1230 runs from 25 animals). The other study (Sergeyenko et al., 2013) included data from normal-aging mice aged 6 – 104 wks, including 416 runs from 46 animals. It was chosen because aged mice show small ABR responses and atypical waveforms that present a challenge to threshold estimation by visual inspection. The two datasets were also complementary, because, in dataset 1, the investigators considered the entire waveform in their threshold determinations (i.e. waves 1 – 5), whereas in dataset 2, the intent was to evaluate the threshold for wave 1 only. For the present study, all data were de-identified, pooled and processed together to ensure that the algorithm was robust to waveforms with variable amplitudes and signal-to-noise ratios.
3.1. Visual Thresholding
Observer 1 for each dataset was the investigator who collected and/or visually thresholded ABR data in the original study (dataset 1: SFM; dataset 2: SGK). All ABR runs were first visually re-thresholded by the first author of the present study (observer 2), adhering to a commonly accepted strategy: i.e. examine a waveform stack (Figure 1A,D), and choose the lowest level at which the latencies of clearly visible peaks have continuously increased re those at higher levels. The example stack from dataset 2 (Figure 1D), where wave 1 threshold was the stated goal, illustrates why even experienced observers can disagree: at 40 dB, a wave 1 conforming to the latency rule is clear; at 35 dB, the only wave 1 candidate occurs 0.2 ms earlier than at 40 dB, and is therefore non-conforming; yet at 30 dB, there is again a conforming wave 1 candidate. One observer chose 30 dB, the other chose 40 dB.
In both datasets, observer 2 identified cases in which threshold was ‘missed’ (i.e. level presentations were not low enough) or traces seemed ‘too noisy’ to confidently determine threshold (dataset 1: n = 51; dataset 2: n = 69). Nevertheless, as shown in Figure 1, thresholds between observers were strongly correlated in both dataset 1 (r = 0.8592, p < 0.001; Figure 1B) and dataset 2 (r = 0.8718, p < 0.001; Figure 1E). Although there were a surprising number of cases where observers differed by ≥ 10 dB (Figure 1C & 1F), a majority of threshold selections were identical; thus, the best-fit straight lines have slopes very close to unity (solid black lines Figure 1B,E).
3.2. Automatic Thresholding
The fitting procedure is summarized in Figure 2, with an example from each dataset. Cross-covariance analysis between adjacent-level presentations was performed on the windowed portion of the waveform (grey shading in Figure 2A & 2E), and as reflected in the total signal lag of the resultant correlogram (2 × windowed signal length in milliseconds; Figure 2B & 2F). Computed correlation coefficients for each level pair at zero signal lag (dotted lines in Figure 2B & 2F respectively) were fit to both a sigmoidal function (Figure 2C & 2G) and a power function (Figure 2D & 2H). For sigmoidal fits, prediction intervals were calculated using the MATLAB function nlpredci (blue shading; Figures 2, 3 & 5). For power fits, predicted values were extrapolated using feval, and prediction intervals using predint (pink shading; Figures 2, 3 & 5).
Figure 2. Cross-covariance analysis.
A,E: Example stacked waveforms from Datasets 1 and 2 in response to 11.3- and 12.1-kHz tone-pips, respectively. Shading indicates the visual thresholding window, i.e. all waves for Dataset 1 and wave 1 only for Dataset 2. Thicker traces indicate visual thresholds for Observers 1 & 2. B,F: Cross-covariance between waveforms at adjacent-level presentations for the examples in A and E. For clarity, only a subset of traces are shown, with SPLs indicated at the peak cross-covariance value. C,G: Sigmoidal fits of the correlation-coefficient (at 0 signal lag) vs. level function. D,H: Power fits of the correlation-coefficient vs. level functions. The 95% prediction intervals are shown by shading and the visual thresholds are indicated with arrows.
Figure 3. Effect of level sampling on algorithmic fitting.
A: An ABR level series in response to 12.1 kHz tone-pips, with visual thresholds for Observers 1 & 2 indicated by thicker traces. B-D: The effects of undersampling are illustrated by removing subsets of averaged waveforms, rerunning the cross-covariance analysis and fitting the resultant points. E-H: For each set of original or undersampled waveforms, the correlation-coefficient vs. level functions are fit using either sigmoidal (blue) or power (pink) functions. The green line shows the criterion level of 0.35. The algorithmic threshold and its corresponding 95% prediction interval is shown in the panel corresponding to the fitting scheme chosen by the algorithm.
Figure 5. Examples of decision tree outcomes.
Example ABR level series (with observer 1 and 2 thresholds emboldened) and their corresponding correlation-coefficient vs. level functions fit using either sigmoidal (blue) or power (pink) functions. Green arrows indicate which fit was selected by the algorithm. In addition to threshold values, 95% prediction intervals are given. Letters (A-D) refer to flowchart in Figure 4. A: The sigmoidal fit has a lower RMS error (0.03 vs. 0.06), and the fit min and max value span the criterion (0.35). B: The power fit has a lower RMS error (0.12 vs. 0.5) and the adjusted R2 (0.84) exceeds 0.7. C: The sigmoidal fit fails based on the slope conditional (d = 0.0047) and RMS error (0.1836 vs. 0.1828). However, because the adjusted R2 (0.6) of the power2 fit is less than the 0.7 cutoff, the run is flagged as needing visual verification. D: The sigmoid fit yielded the lower error term (0.080 vs. 0.083), however neither fits crosses the criterion value of 0.35.
When restricting the waveform window around wave 1 (dataset 2), correlation coefficients approach the maximum possible value of 1 (Figure 2G-H). For the entire-waveform windowing (dataset 1; Figure 2C-D), the correlation coefficients peak at a somewhat lower value, presumably because the later peaks show more level-dependent complexity, with so many contributing generators at high stimulus levels.
To build an autothresholding algorithm, we sought to fit each correlation-coefficient vs. level series with a simple function, and set a criterion level of correlation coefficient by which to define threshold by interpolation/extrapolation. As illustrated in Figure 3A, when the ABR series includes adequate sub-threshold and supra-threshold level steps, the correlation-coefficient vs. level function (Figure 3E) looks sigmoidal, and, indeed, a sigmoid fit has a lower RMS error than a power-function fit. Threshold, if arbitrarily defined as a correlation coefficient of 0.35, is 38.0 dB, i.e. roughly midway between the values chosen by Observer 1 (35 dB) and Observer 2 (40 dB). If the data were collected in 10 dB steps (Figure 3B), the power fit provides a lower RMS error and the interpolated threshold differs by 2.9 dB. In a case such as this, with high signal-to-noise ratios, removing supra-threshold levels (Figure 3C) has little effect: a sigmoidal fit (Figure 3G) still has a lower RMS error, and the interpolated threshold differs by only 0.3 dB. If the experimenter “misses” threshold, i.e. the lowest level is not low enough (Figure 3D) the power fit now has the lower RMS error, but the extrapolated threshold differs from that obtained using the sigmoidal fit by only 1.4 dB. The similarity of the interpolated or extrapolated thresholds under these four conditions suggest the robustness of an auto-thresholding process and its potential utility in flagging, during data acquisition, a failure to choose appropriate presentation levels. Notice that in each case, the prediction bounds of the algorithmically determined threshold can be used to further assess the robustness of the interpolated threshold. Indeed, when levels around threshold are adequately sampled (e.g. Figure 3A & 3C), the prediction interval mirrors the discrepancy between observers.
Based on an iterative process, in which we analyzed all the ABR runs in Dataset 1 and 2, examined any cases in which the algorithm failed to match the visual thresholding results, and then modified the algorithm and repeated the process, a final decision tree evolved (Figure 4). Fundamentally, the algorithm simply fits the correlation-coefficient vs. level function with a sigmoidal and a power function (MATLAB functions sigm-fit or power2, respectively), and chooses the fit with the lower RMS error (conditional C2). For the sigmoidal fit, the added conditional (C1) stipulates that the min and max values (a and b in Equation 1) must span the threshold criterion (see below), and that the slope of the fit (d in Equation 1) is not too close to either 0 or 1. An example ABR run following this path (A) is illustrated in Figure 5A. For the power-fit arm, we add two more conditionals. C3: if the goodness of fit (adjusted R2; degree-of-freedom adjusted coefficient of determination) is greater than 0.7, threshold is interpolated, and the result is deemed highly reliable (path B, see Figure 5B). If the goodness of fit is poor, the last conditional (C4) assesses whether the correlation coefficient ever exceeds criterion. If so, the threshold is interpolated, but flagged for follow-up visual inspection (path C, see Figure 5C). If not, no threshold is suggested, and it is likely that no response was present (path D, see Figure 5D), or perhaps only a putative response at the highest sound level.
Figure 4. Flowchart of the decision tree for the thresholding algorithm.
The input is an ABR waveform level series from which the correlation-coefficient level is fit by either sigmoidal or power functions. C1: this conditional compares, for the sigmoidal fit, the slope (d) and range (min and max values a and b) to acceptable values. C2: compares RMS errors of the two fits to choose the better strategy. C3: flags noisy waveforms by assessing the adjusted r2 of the fit. C4: flags cases requiring visual thresholding, i.e. where the maximum fit value is less than criterion (0.35). A-D: refer to example waveform stacks taking each of the four logical paths through the algorithm, as illustrated in Figure 5. The percentages of ABR runs taking each of the four logical paths for each of the two datasets are given at the bottom (green).
To define a single criterion value regardless of age, frequency or cochlear function, the Spearman’s correlation coefficient was computed between observer and algorithm thresholds at different criterion values from 0.1 to 0.5 (Figure 6A), as well as the stringency of the algorithm (Figure 6B), i.e. the percentage of runs for which the measured correlation coefficient exceeded criterion (paths A, B or C in Figure 4 and 5). Visual inspection of these data suggested that the criterion best matching our observers was a correlation coefficient of 0.35. As shown in Figure 6C-F, individual data points suggests a slight tendency of the algorithm to overestimate thresholds in dataset 1 at the selected criterion level of 0.35. Nevertheless, at a criterion of 0.35, mean algorithm and observer thresholds are well correlated across all frequencies tested for both datasets. Other labs implementing this algorithm could make similar analyses to select the criteria that best matches their visual-thresholding strategies.
Figure 6. Optimization of threshold criterion.
A: For each dataset and each observer, the correlation (Spearman’s r) between observer and algorithmic thresholds was assessed at different criterion values. B: The stringency of the algorithm was assessed by tracking the percentage of ABR runs for which the computed correlation coefficient exceeded criterion, i.e. taking paths A, B, or C in the flowchart (Figure 4). C,D,E,F: Scatterplots comparing observer and algorithm thresholds with a color code for tone-pip frequency, for both individual observations (small points) and mean values (large points). Dashed lines indicate equality ±20 dB between the two measures, and the best-fit line is shown in solid black.
3.3. Assessing Performance: visual vs algorithmic
A basic goal of any auto-thresholding process is to match existing visual thresholding norms, while establishing a rule-based approach that eliminates user bias. Beyond that practical goal, one can ask which approach is better, or more accurate, in some absolute sense.
One way to assess algorithm performance is to compare the averaged waveforms for stimulus levels near threshold, as defined by either the automatic algorithm or by visual thresholding. Examples for stimulus frequencies near 22 kHz are shown in Figure 7. For Dataset 1, which included many animals with normal and near-normal thresholds, the algorithm (Figure 7C) appears to be doing a better job than the visual method (Figure 7A-B) of extracting consistent waveform features near threshold, as evidenced qualitatively by smoother looking peaks and troughs. For Dataset 2, which includes aging animals with greatly attenuated ABR responses, the algorithm appears to do a better job in that there is greater coherence between the mean threshold waveform and those at higher stimulus levels, especially when focusing on peaks around the windowed portion of the waveform. Note that these qualitative differences exist despite the fact that the mean thresholds determined by Observer 1, Observer 2 and the algorithm were all within 2.9 dB for dataset 1 and within 1.4 dB for dataset 2.
Figure 7. Assessing algorithm performance by comparing threshold-normalized, grand-average waveforms (A-F), correlation-coefficient vs. level functions and amplitude vs. level functions (G,H).
A-F: For all ABR runs at 22.6 kHz from Dataset 1 (A,B,C) or 21.1 kHz from Dataset 2 (D,E,F), the stimulus level of each waveform in each stack was redefined as dB re threshold, according to either Observer 1 (A,D), Observer 2 (B,E) or the algorithm (C,F). Then, for each threshold-picker, waveforms were grouped into 5 dB bins of level re threshold and then averaged. Means and standard errors are displayed in A-F for each dataset/observer for sound levels from −10 to 25 dB re threshold. Grey shading indicates the waveform window used for threshold determination. G,H: Comparison of average wave 1 amplitude (left Y axis, square symbols & blue lines) and adjacent-level correlation coefficients (right Y axis, circular symbols & orange lines) vs. level re threshold, as determined using either algorithmic or visual methods. Data are extracted from the same runs shown in A-F. Threshold mean ± SEM values for: dataset 1 were 31.7 ± 1.0 (observer 1), 30.7 ± 1.2 (observer 2), 33.6 ± 1.0 (algorithm); dataset 2 were 43.4 ± 1.3 (observer 1), 44.4 ± 1.5 (observer 2), 44.7 ± 1.9 (algorithm).
To quantitatively assess performance of the algorithm compared to visual observers, we computed the cross-correlations between adjacent levels for each set of grand-average, threshold-normalized waveforms (Figure 7G-H; circular symbols & orange lines, right Y axis). While mean thresholds are comparable between methods, the correlation coefficients at threshold levels (and at subthreshold levels for Dataset 1) are higher for the algorithm, suggesting that it has more reliably captured true waveform similarities at the lowest sound pressure levels. Note that at suprathreshold levels, the correlation coefficients of adjacent-level waveforms are practically identical across methods.
To further quantify algorithm performance, we compared the mean wave 1 amplitude-vs-level functions when expressed relative to threshold (Figure 7G-H; square symbols & blue lines; left Y axis). Between methods (i.e. Observer 1, Observer 2 and Algorithm), a higher mean Wave 1 amplitude at any given level implies better ‘binning’ of waveforms relative to threshold. Two-way ANOVA analysis showed no significant main effects of method for either dataset.
4. Discussion
Computer algorithms have long been used in auditory research to objectively measure thresholds for sound-evoked electrophysiological responses. Decades ago, Kiang and colleagues designed a computer algorithm to automate the measurement of tuning curves from single auditory-nerve fibers, which became widely adopted: It tracked an iso-response contour corresponding to the sound pressure levels required, as a function of stimulus frequency, to raise the fiber’s response rate by 10 spikes per second over the background rate (Kiang et al., 1970; Liberman, 1978). Similarly, over the years, many labs have implemented computer programs to automate threshold measurement for the round-window compound action potentials (CAPs), by tracking the sound pressure required, as a function of tone-pip frequency, to evoke a CAP of a criterion peak-to-peak amplitude (e.g. Liberman, 1991). Both these algorithms were simple to describe, quick to administer and remarkably reproducible. Although CAP thresholds measured in this way closely paralleled the underlying single-fiber thresholds, the CAP thresholds were significantly higher (Liberman, unpublished), because the practical limit on averaging time (to minimize measurement time) ensures that noise in this type of gross far-field potential masks the smallest response changes in the most sensitive elements that contribute to it.
Although it would be equally desirable to automate measurement of ABR thresholds, the problem is more challenging, because ABRs are much smaller than CAPs, and require much more signal averaging. Although an iso-response approach could be applied to ABR threshold tracking, the acquisition times would be so long, and the criterion response amplitude would have to be so high (and the resultant “thresholds” so far above behavioral thresholds) that any approach of this type would be unsatisfactory. Instead, ABR thresholds are usually assessed by examination of waveform morphology, typically by visual inspection of a stack of waveforms obtained at increasing sound levels, and the subjective selection of the lowest sound-pressure level at which “consistent” peaks or troughs are observed. This is the accepted practice, because emergence of a consistent waveform morphology occurs at a much lower stimulus level than that at which the peak-peak amplitude reliably exceeds that obtained in silence.
We are not the first to develop and study an algorithm to automate the analysis of ABR waveforms, but, to date, none has been widely accepted. Most prior attempts at computerized ABR analysis have been directed toward human data, where the aim was often to determine pass/fail of an isolated response (e.g. Cebulla et al., 2000; Davey et al., 2003; Davey et al., 2007; McKearney et al., 2019). In some cases, the aim was to recognize threshold during testing, to speed up the data acquisition process (e.g. Ozdamar et al., 1994; Wicke et al., 1978); however, most have been designed for post hoc analysis of stored waveforms (Sanchez et al., 1995).
Some prior studies required a statistical analysis of all the unaveraged waveforms obtained at each stimulus level (e.g. Sininger, 1993). This approach requires modification of existing acquisition software, as well as significant storage space, and is not backwards compatible with any stored datasets of waveform averages.
Other approaches were based on training of artificial neural networks to match the pass/fail classification of waveforms by expert observers (Davey et al., 2003; Davey et al., 2007; McKearney et al., 2019), an approach that requires retraining on each new type of response morphology. Other approaches required the definition of response templates to which the candidate waveforms are compared, usually by cross-correlation (e.g. Cone-Wesson et al., 1997; Elberling, 1979). We wanted to avoid the need for retraining and/or the use of templates, because we wanted our algorithm to be applicable to a wide range of inner-ear pathologies, both genetic and acquired, and it’s clear that ABR waveforms can differ significantly in different pathologies in either animal or human studies.
Some prior approaches to ABR thresholding algorithms have been based on cross-correlations between waveforms from two separate averages obtained at the same stimulus frequency/level (e.g. Arnold, 1985; Weber et al., 1980). This doubling of waveforms can be achieved by doubling the acquisition time, but that represents a significant drawback. Such a cross-correlation approach could also be implemented by splitting the set of single-repetition waveforms into two subsets (Berninger et al., 2014), but that increases the noisiness of each average waveform (by √2), which is also a significant drawback. Indeed, we tried that approach, since our software allows us to extract the separate responses to the positive- vs. negative-polarity tone-bursts in the tone-pip train. We found those correlation-coefficient vs. level functions, where each average comprised only half the number of stimulus repetitions, to be noisier (and less reliable in extracting threshold) than those derived by comparing the full ensemble averages across sound levels.
At least one prior study used the same approach adopted here, i.e. cross-correlation analysis of waveforms from adjacent stimulus presentation levels (Grandori, 1983). Although the prior study demonstrated the promise of such an approach, the analysis was not fully developed into a usable algorithm and was only evaluated for a small number of waveforms (5 subjects). Indeed, many prior studies tested their algorithms on relatively small numbers of sample waveforms. In contrast, our algorithm has been more extensively vetted, i.e. by testing against a database of 1646 waveform stacks, from normal and non-normal animals, collected and visually thresholded by one of two investigators, and then independently assessed by a third.
Our algorithm can be incorporated into data-acquisition software to speed data collection by increasing step size after threshold is reached (in an increasing-level series) or by reversing direction and decreasing step size after threshold is reached (in a decreasing-level series). The algorithm can also be used during data acquisition to alert users (in an increasing level series) when they have “missed” threshold, when it detects a supra-criterion correlation between the waveforms obtained at the lowest sound levels. This is a common problem, in our experience, especially among novice users.
One of the most appealing aspects of the adjacent-level cross-correlation approach is its conceptual simplicity. It is easy to grasp, at least in part, because the algorithm is essentially doing exactly what users are taught to do when visually thresholding an ABR waveform stack. Because the algorithm operates on a typical ABR waveform stack, it is backwards (or forwards) compatible with archival (or ongoing) datasets obtained using any custom or commercial system, with only minimal manipulation to format the input waveforms appropriately. By windowing the waveform stacks in the time domain, the algorithm can be customized to track thresholds for different ABR waves, as was shown here. By computing algorithm thresholds for a set of visually thresholded stacks, each laboratory/user can choose the cross-correlation criterion value that best matches the strictness with which they assess waveform “consistency”, while still retaining the information necessary to compare between different laboratories or even different individuals/times within laboratories.
By curve-fitting the correlation-coefficient vs. level functions, and picking the point at which the fit curve crosses the criterion value of correlation, our algorithm can interpolate threshold, to arguably provide more accuracy than that afforded by visual thresholding, where the user is constrained to the 5 or 10 dB level increments at which the waveforms are obtained. Additionally the ability to flag suspicious runs and estimate thresholds, even if highly error prone, at least objectifies determination of ‘signal’ vs. ‘noise’. Indeed, the superior accuracy of the algorithm re visual thresholding is suggested by two observations: 1) the algorithm generally arrives at similar thresholds to those determined visually (Figure 6C-D), yet 2) when the same set of waveforms are grouped and averaged according to dB re threshold, the wave morphologies are more robust both at and even below threshold, when using the algorithmic thresholds (Figure 7A-E).
Here, we validated our autothresholding algorithm against a large set of tone-pip ABRs from mice with a range of cochlear pathologies and threshold sensitivities. We have also validated the algorithm on click-evoked ABR data from mouse and from tone-pip evoked ABRs from guinea pig (data not shown). Because the algorithm relies only on intrinsic comparisons, species or stimulus differences do not affect performance. Similarly, the approach should work equally well for vestibular evoked potentials (Jones et al., 1999), electroretinograms (Brown, 1969), or any stimulus-evoked electrical responses where the data-acquisition paradigm consists of a stimulus-level series and a need to objectively determine a “threshold”. Extending to other applications should only require an initial comparison to visually determined thresholds to establish the most appropriate correlation-coefficient criterion.
Slight modifications to the algorithm could automate the extraction of other response parameters. For example, latency-level functions could be extracted, even from noisy data, by time-windowing the responses around the peak of interest and measuring the time lag producing the maximum correlation between adjacent-level waveforms.
Objectifying the threshold criterion for this widely used metric of cochlear function should ultimately reduce variance within research groups, and improve reliability of comparisons across groups. The objectification of threshold measures is particularly important given the rising number of translational studies aimed at rebuilding a damaged inner ear, where the ABR responses are used as a key metric of functional recovery, and where the amount of recovery is often slight, and therefore strongly subject to observer bias. Furthermore, the algorithm described here should be applicable to any evoked response measured as a function of increasing stimulus level, which need not be restricted to the auditory system.
Highlights.
Objective thresholding of Auditory Brainstem Responses can be performed computationally using a simple algorithm based on signal processing theory
Our algorithm is easily applied during data acquisition to alert user to errors in level selection, or post hoc on archival data to remove potential user bias
Quantitative analyses suggest that algorithmically determined thresholds outperform visually determined thresholds
Acknowledgements:
The authors would like to thank Stephane Maison and Sharon Kujawa for providing access to the ABR data, John Guinan and Bertrand Delgutte for their helpful comments on the manuscript, and Ken Hancock for his assistance in packaging the algorithm into the existing EPL Cochlear Function Test Suite & ABR Peak Analysis software.
Funding: Research reported in this publication was supported by National Institute on Deafness and other Communicative Disorders of the National Institutes of Health under award number R01 DC000188. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Abbreviations used:
- ABR
Auditory Brainstem Response
- SPL
Sound Pressure Level
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Achor LJ, Starr A 1980a. Auditory brain stem responses in the cat. I. Intracranial and extracranial recordings. Electroencephalogr Clin Neurophysiol 48, 154–73. [DOI] [PubMed] [Google Scholar]
- Achor LJ, Starr A 1980b. Auditory brain stem responses in the cat. II. Effects of lesions. Electroencephalogr Clin Neurophysiol 48, 174–90. [DOI] [PubMed] [Google Scholar]
- Acır N, Özdamar Ö, Güzeliş C 2006. Automatic classification of auditory brainstem responses using SVM-based feature selection algorithm for threshold detection. Engineering Applications of Artificial Intelligence 19, 209–218. [Google Scholar]
- Arnold SA 1985. Objective versus visual detection of the auditory brain stem response. Ear Hear. 6, 144–50. [DOI] [PubMed] [Google Scholar]
- Berninger E, Olofsson A, Leijon A 2014. Analysis of click-evoked auditory brainstem responses using time domain cross-correlations between interleaved responses. Ear Hear. 35, 318–29. [DOI] [PubMed] [Google Scholar]
- Bogaerts S, Clements JD, Sullivan JM, Oleskevich S 2009. Automated threshold detection for auditory brainstem responses: comparison with visual estimation in a stem cell transplantation study. BMC Neuroscience 10, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brown KT 1969. The electroretinogram: its components and their origins. UCLA forum in medical sciences 8, 319–78. [PubMed] [Google Scholar]
- Buran BN, Strenzke N, Neef A, Gundelfinger ED, Moser T, Liberman MC 2010. Onset coding is degraded in auditory nerve fibers from mutant mice lacking synaptic ribbons. J. Neurosci 30, 7587–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cebulla M, Sturzebecher E, Wernecke KD 2000. Objective detection of auditory brainstem potentials: comparison of statistical tests in the time and frequency domains. Scand Audiol 29, 44–51. [DOI] [PubMed] [Google Scholar]
- Cone-Wesson BK, Hill KG, Liu G-B 1997. Auditory brainstem response in tammar wallaby (Macropus eugenii). Hear. Res 105, 119–129. [DOI] [PubMed] [Google Scholar]
- Davey R, McCullagh P, McAllister G, Houston H 2003. modelling of the brainstem evoked response for objective automated interpretation.
- Davey R, McCullagh P, Lightbody G, McAllister G 2007. Auditory brainstem response classification: a hybrid model using time and frequency features. Artif Intell Med 40, 1–14. [DOI] [PubMed] [Google Scholar]
- Delgado RE, Ozdamar O 1994. Automated auditory brainstem response interpretation. IEEE Engineering in Medicine and Biology Magazine 13, 227–237. [Google Scholar]
- Don M, Elberling C, Waring M 1984. Objective detection of averaged auditory brainstem responses. Scand Audiol 13, 219–28. [DOI] [PubMed] [Google Scholar]
- Elberling C 1979. Auditory electrophysiology. The use of templates and cross correlation functions in the analysis of brain stem potentials. Scand Audiol 8, 187–90. [DOI] [PubMed] [Google Scholar]
- Elberling C, Don M 1984. Quality estimation of averaged auditory brainstem responses. Scan Audiol 13. [DOI] [PubMed] [Google Scholar]
- Gorga MP, Worthington DW, Reiland JK, Beauchaine KA, Goldgar DE 1985. Some comparisons between auditory brain stem response thresholds, latencies, and the pure-tone audiogram. Ear and hearing 6, 105–112. [DOI] [PubMed] [Google Scholar]
- Grandori F 1983. Detection of low intensity auditory evoked responses. Achives of acoustics quarterly 8, 131–138. [Google Scholar]
- Henry KR 1979a. Auditory brainstem volume-conducted responses: origins in the laboratory mouse. Journal of the American Auditory Society 4. [PubMed] [Google Scholar]
- Henry KR 1979b. Auditory nerve and brain stem volume-conducted potentials evoked by pure-tone pips in the CBA/J laboratory mouse. Audiol. 18, 93–108. [DOI] [PubMed] [Google Scholar]
- Henry KR 1979c. Differential changes of auditory nerve and brain stem short latency evoked potentials in the laboratory mouse. Electroencephalogr Clin Neurophysiol 46, 452–9. [DOI] [PubMed] [Google Scholar]
- Jones TA, Jones SM 1999. Short latency compound action potentials from mammalian gravity receptor organs. Hear. Res 136, 75–85. [DOI] [PubMed] [Google Scholar]
- Kiang NY, Moxon EC, Levine RA 1970. Auditory-nerve activity in cats with normal and abnormal cochleas. In: Sensorineural hearing loss. Ciba Found Symp, 241–73. [DOI] [PubMed] [Google Scholar]
- Kirsh I, Thornton A, Burkard R, Halpin C 1992. The effect of cochlear hearing loss on auditory brain stem response latency. Ear Hear. 13, 233–5. [DOI] [PubMed] [Google Scholar]
- Land R, Burghard A, Kral A 2016. The contribution of inferior colliculus activity to the auditory brainstem response (ABR) in mice. Hear. Res 341, 109–118. [DOI] [PubMed] [Google Scholar]
- Lewis JD, Kopun J, Neely ST, Schmid KK, Gorga MP 2015. Tone-burst auditory brainstem response wave V latencies in normal-hearing and hearing-impaired ears. J. Acoust. Soc. Am 138, 3210–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liberman MC 1978. Auditory-nerve response from cats raised in a low-noise chamber. J Acoust Soc Am 63, 442–55. [DOI] [PubMed] [Google Scholar]
- Liberman MC 1991. The olivocochlear efferent bundle and susceptibility of the inner ear to acoustic injury. J Neurophysiol 65, 123–32. [DOI] [PubMed] [Google Scholar]
- Liberman MC, Liberman LD, Maison SF 2014. Efferent feedback slows cochlear aging. J. Neurosci 34, 4599–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liberman MC, Liberman LD, Maison SF 2015. Chronic Conductive Hearing Loss Leads to Cochlear Degeneration. PloS one 10, e0142341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madsen SMK, Harte JM, Elberling C, Dau T 2018. Accuracy of averaged auditory brainstem response amplitude and latency estimates. Int. J. Audiol 57, 345–353. [DOI] [PubMed] [Google Scholar]
- Maison SF, Usubuchi H, Liberman MC 2013. Efferent Feedback Minimizes Cochlear Neuropathy from Moderate Noise Exposure. J. Neurosci 33, 5542–5552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKearney RM, MacKinnon RC 2019. Objective auditory brainstem response classification using machine learning. Int. J. Audiol 58, 224–230. [DOI] [PubMed] [Google Scholar]
- Mehraei G, Hickox AE, Bharadwaj HM, Goldberg H, Verhulst S, Liberman MC, Shinn-Cunningham BG 2016. Auditory Brainstem Response Latency in Noise as a Marker of Cochlear Synaptopathy. J. Neurosci 36, 3755–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Melcher JR, Kiang NY 1996a. Generators of the brainstem auditory evoked potential in cat. III: Identified cell populations. Hear. Res 93, 52–71. [DOI] [PubMed] [Google Scholar]
- Melcher JR, Guinan JJ Jr., Knudson IM, Kiang NY 1996b. Generators of the brainstem auditory evoked potential in cat. II. Correlating lesion sites with waveform changes. Hear. Res 93, 28–51. [DOI] [PubMed] [Google Scholar]
- Melcher JR, Knudson IM, Fullerton BC, Guinan JJ Jr., Norris BE, Kiang NY 1996c. Generators of the brainstem auditory evoked potential in cat. I. An experimental approach to their identification. Hear. Res 93, 1–27. [DOI] [PubMed] [Google Scholar]
- Ozdamar O, Delgado RE, Eilers RE, Urbano RC 1994. Automated electrophysiologic hearing testing using a threshold-seeking algorithm. J Am Acad Audiol 5, 77–88. [PubMed] [Google Scholar]
- Rasetshwane DM, Argenyi M, Neely ST, Kopun JG, Gorga MP 2013. Latency of tone-burst-evoked auditory brain stem responses and otoacoustic emissions: level, frequency, and rise-time effects. J. Acoust. Soc. Am 133, 2803–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ridley CL, Kopun JG, Neely ST, Gorga MP, Rasetshwane DM 2018. Using Thresholds in Noise to Identify Hidden Hearing Loss in Humans. Ear Hear. 39, 829–844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanchez R, Riquenes A, Perez-Abalo M 1995. Automatic detection of auditory brainstem responses using feature vectors. International journal of bio-medical computing 39. [DOI] [PubMed] [Google Scholar]
- Sergeyenko Y, Lall K, Liberman MC, Kujawa SG 2013. Age-Related Cochlear Synaptopathy: An Early-Onset Contributor to Auditory Functional Decline. J. Neurosci 33, 13686–13694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shaheen LA, Valero MD, Liberman MC 2015. Towards a Diagnosis of Cochlear Neuropathy with Envelope Following Responses. J. Assoc. Res. Otolaryngol 16, 727–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sininger YS 1993. Auditory brain stem response for objective measures of hearing. Ear Hear. 14, 23–30. [DOI] [PubMed] [Google Scholar]
- Vannier E, Adam O, Motsch JF 2002. Objective detection of brainstem auditory evoked potentials with a priori information from higher presentation levels. Artif Intell Med 25, 283–301. [DOI] [PubMed] [Google Scholar]
- Vidler M, Parkert D 2004. Auditory brainstem response threshold estimation: subjective threshold estimation by experienced clinicians in a computer simulation of the clinical test. Int. J. Audiol 43, 417–29. [DOI] [PubMed] [Google Scholar]
- Walsh EJ, Gorga M, McGee J 1992. Comparisons of the development of auditory brainstem response latencies between cats and humans. Hear. Res 60, 53–63. [DOI] [PubMed] [Google Scholar]
- Weber BA 1983. Pitfalls in auditory brain stem response audiometry. Ear Hear. 4, 179–84. [DOI] [PubMed] [Google Scholar]
- Weber BA, Fletcher GL 1980. A computerized scoring procedure for auditory brainstem response audiometry. Ear Hear. 1, 233–6. [DOI] [PubMed] [Google Scholar]
- Wicke JD, Goff WR, Wallace JD, Allison T 1978. On-line statistical detection of average evoked potentials: application to evoked response audiometry (ERA). Electroencephalography and clinical neurophysiology 44, 328–43. [DOI] [PubMed] [Google Scholar]
- Wong PKH, Bickford RG 1980. Brain stem auditory evoked potentials: The use of noise estimate. Electroenceph. Clin. Neurophysiol 50, 25–34. [DOI] [PubMed] [Google Scholar]







