
Keywords: auditory illusion, binaural hearing, dichotic pitch, inferior colliculus, rate-place code
Abstract
Dichotic pitches such as the Huggins pitch (HP) and the binaural edge pitch (BEP) are perceptual illusions whereby binaural noise that exhibits abrupt changes in interaural phase differences (IPDs) across frequency creates a tonelike pitch percept when presented to both ears, even though it does not produce a pitch when presented monaurally. At the perceptual and cortical levels, dichotic pitches behave as if an actual tone had been presented to the ears, yet investigations of neural correlates of dichotic pitch in single-unit responses at subcortical levels are lacking. We tested for cues to HP and BEP in the responses of binaural neurons in the auditory midbrain of anesthetized cats by varying the expected pitch frequency around each neuron’s best frequency (BF). Neuronal firing rates showed specific features (peaks, troughs, or edges) when the pitch frequency crossed the BF, and the type of feature was consistent with a well-established model of binaural processing comprising frequency tuning, internal delays, and firing rates sensitive to interaural correlation. A Jeffress-like neural population model in which the behavior of individual neurons was governed by the cross-correlation model and the neurons were independently distributed along BF and best IPD predicted trends in human psychophysical HP detection but only when the model incorporated physiological BF and best IPD distributions. These results demonstrate the existence of a rate-place code for HP and BEP in the auditory midbrain and provide a firm physiological basis for models of dichotic pitches.
NEW & NOTEWORTHY Dichotic pitches are perceptual illusions created centrally through binaural interactions that offer an opportunity to test theories of pitch and binaural hearing. Here we show that binaural neurons in auditory midbrain encode the frequency of two salient types of dichotic pitches via specific features in the pattern of firing rates along the tonotopic axis. This is the first combined single-unit and modeling study of responses of auditory neurons to stimuli evoking a dichotic pitch.
INTRODUCTION
Stimulating the two ears with broadband noise in which the interaural phase changes rapidly over a narrow frequency band produces the illusion of a pure tone against a background of noise (1–3). Because the monaural inputs to either ear alone do not produce a pitch percept, and the pitch is dependent on the change in interaural phase relationships, the percept is called dichotic pitch. Dichotic pitch phenomena offer an opportunity to investigate neural mechanisms for both pitch perception and binaural hearing and how they may interact. Two of the most salient dichotic pitches are the Huggins pitch (1) and the binaural edge pitch (2). A Huggins pitch (HP; Fig. 1B) is created when the interaural phase changes by 2π radians over a narrow “transition band” centered at frequency FB. Because the overall phase change is equal to one cycle, the interaural phase difference (IPD) at the center of the transition band differs from the IPD outside the transition band by π radians. Two widely studied variants of HP are HP− (Fig. 1B, right), where the IPD is equal to π at the center of the transition band and to 0 outside the transition band, and HP+ (Fig. 1B, left), where the IPD is 0 at the center of the transition band and π outside the transition band. In contrast, a binaural edge pitch (BEP; Fig. 1A) is created when the interaural phase changes by π radians over the transition band so that the frequency spectrum is divided into two regions differing in interaural phase by π radians. Two instances are when the interaural phases below and above the transition band are 0 and π, respectively (BEP+/−; Fig. 1A, left), and vice versa (BEP−/+; Fig. 1A, right).
Figure 1.

Illustration of two simple dichotic pitches and expected neural responses based on a cross-correlation model of binaural processing. For both types of dichotic pitches the same broadband noise is presented to both ears, but a sharp change in interaural phase is imposed over a narrow frequency band centered on a “boundary frequency,” FB. The percept is a tone at or near FB against a background of noise. A: the binaural edge pitch (BEP) stimulus transitions from being homophasic below FB to antiphasic above (BEP+/−) or vice versa (BEP−/+). IPD, interaural phase difference. B: the Huggins pitch (HP) stimulus is either in phase within a narrow band centered on FB and antiphasic elsewhere (HP+) or vice versa (HP−). C and D: predictions of a cross-correlation model of interaural time difference (ITD)-sensitive inferior colliculus (IC) neurons for BEP (C) and HP (D) stimuli. Inputs to each ear are band-pass filtered, delayed on one side, and then cross-correlated. A quadratic function relates interaural cross-correlation to predicted firing rate. C and D, top, show interaural phase spectra of the dichotic pitch stimuli for 3 values of FB in relation to the neuron’s frequency tuning curve (gray shading). C and D, bottom, show the predicted firing rate as a function of FB when the stimuli are presented with an imposed ITD equal to either the neuron’s best ITD (Ca, Cb, Da, Db) or the worst ITD (Cc, Cd, Dc, Dd). Colors code for the type of feature predicted: green for peaks and rising edges, red for troughs and falling edges; the same color code is used in Figs. 2–5. The model makes two predictions: 1) The firing rate shows a sharp feature (edge for BEP, peak or trough for HP) when FB is near the neuron’s BF. 2) The feature direction (rising vs. falling edge for BEP, peak vs. trough for HP) depends systematically on both the imposed ITD (best ITD vs. worst ITD) and the interaural phase configuration of the stimulus (BEP+/− vs. BEP−/+ and HP+ vs. HP−). See text and Table 2 for details.
Huggins pitch is closely related to binaural unmasking, the improved detection of a pure tone in noise that results from inverting the polarity of either the signal (N0Sπ, analogous to HP−) or the noise (NπS0, analogous to HP+) in one ear (4, 5). One class of binaural models assumes that the central binaural processor computes the interaural cross-correlation in narrowband frequency channels (6–8). Channels containing the signal band are decorrelated because their activity is determined by a mixture of the signal band IPD and the noise masker IPD. In theory, the contrast between this decorrelated activity and the fully correlated (N0Sπ, HP−) or anticorrelated (NπS0, HP+) activity in the flanking channels indicates the presence of the signal and creates a pitch percept (6–8).
Because a dichotic pitch is only perceived when inputs to the two ears are simultaneously present, studies of dichotic pitch may lead to specific tests for central hearing disorders. Deficits in Huggins pitch perception have been reported in subjects with dyslexia (9–11), although the deficits were not always specific to binaural hearing (12) and not always consistent across studies (13). Deficits in HP perception have also been reported for subjects diagnosed with autism spectrum disorder (14, 15).
Dichotic pitches have been called “illusions of binaural unmasking” (8). Yet, in many respects, dichotic pitch behaves perceptually and physiologically much as if an actual tone had been presented to the ears. Abrupt interaural phase shifts at multiple harmonically related frequencies (16) or combinations of harmonically related interaural phase shifts and actual pure tones (17) create a complex pitch percept at the common fundamental frequency, even if the stimulus contains no pitch-creating feature at the fundamental. HP stimuli produce excess forward masking of pure tones at the boundary frequency relative to the masking produced by broadband noise, much as if a tone had been added to the noise maker (18). Responses to HP stimuli recorded from the auditory cortex by electroencephalography (EEG) or magnetoencephalography (MEG) show long latency components distinct from responses to diotic broadband noise, suggesting specific neural processing of binaurally derived cues associated with the pitch percept (19, 20). Using MEG, Hertrich et al. (21) and Chait et al. (22) identified a pitch-onset response (POR) to HP stimuli distinct from the onset response, with morphology and latency similar to the POR to other pitch-evoking stimuli such as iterative ripple noise or pure tones in noise. The dependence of the POR latencies on stimulus parameters largely paralleled the subjects’ reaction times in HP detection (22). Using functional magnetic resonance imaging (fMRI), Puschmann et al. (23) demonstrated similar patterns of cortical activation in response to HP and pure tones in noise.
Together, the neural, perceptual, and modeling results cited above suggest that dichotic pitch stimuli may produce patterns of neural activity resembling those produced by pure tones in noise at a relatively early stage of processing, perhaps as early as the initial stage of binaural interactions in the superior olivary complex (SOC). Yet, there have been very few investigations of responses of binaural auditory neurons to dichotic pitch stimuli at the single-unit level, which remains the “gold standard” for understanding how brain activity relates to behavior. Mc Laughlin et al. (24) recorded responses of binaural neurons in the inferior colliculus (IC) of anesthetized cats to BEP stimuli, which they called “flip noise.” Because their goal was to compare binaural bandwidths of IC neurons with those of auditory nerve fibers, they did not attempt to relate neural responses to pitch percepts. Nevertheless, their results show that IC neurons encode the perceived pitch frequency of BEP stimuli via edges in firing rates when the boundary frequency FB is varied around the neuron’s best frequency (BF). Alsindi et al. (25) measured responses of a small sample of single units in the medial superior olive (MSO) of anesthetized guinea pigs to broadband noise stimuli with interaural delays in the range of several milliseconds that evoke a weak “dichotic repetition pitch” (DRP) in some human listeners. A small fraction of their units showed a correlate of DRP in their temporal discharge patterns in that the all-order interspike interval distribution had a peak at the expected pitch period. The authors concluded: “Future studies should consider measuring the responses of single SOC units to sounds which are capable of producing much stronger binaural pitch percepts, such as… Huggins pitch.”
We hypothesized that interaural time difference (ITD)-sensitive neurons in the IC would show correlates of strong dichotic pitches such as HP and BEP. These neurons exhibit frequency selectivity (26–28), are sensitive to interaural correlation (29–31), and show correlates of binaural unmasking (32–34). Responses of IC neurons to broadband noise stimuli varying in ITD are well predicted by a model (Fig. 1C) in which the band pass-filtered inputs to the two ears are cross-correlated after application of an internal delay to one side (35). We tested the hypothesis that responses of IPD-sensitive neurons to dichotic pitch stimuli are qualitatively consistent with the cross-correlation model by recording from single neurons in the IC of anesthetized cats. We used both HP (Fig. 1B) and BEP (Fig. 1A) stimuli but imposed an additional interaural delay to match either the best or worst ITD of each neuron, to maximize the expected effects of varying the boundary frequency FB on neural firing rates (24). For both HP and BEP, we found correlates of dichotic pitch in the form of features (peaks, troughs, or edges) in firing rate profiles when the boundary frequency FB was near the neuron’s BF, with the type of feature observed being as predicted by the cross-correlation model. We further show that a neural population model in which the behavior of individual neurons is governed by the cross-correlation model and the neurons have physiologically realistic distributions of BF and best IPD predicts trends in HP detection by human listeners.
METHODS
Animal Preparation
Experiments were performed on 11 healthy adult cats (9 male, 2 female) weighing between 2.5 and 4 kg. Anesthesia was induced with a mixture of diallyl barbituric acid (75 mg/kg; Sigma, St. Louis, MO) and urethane (300 mg/kg), and supplemental doses were administered as needed to maintain areflexia to strong toe pinches. Temperature and heart rate were monitored at all times, and a tracheotomy was performed to facilitate respiration. The pinnae were removed to allow insertion of hollow ear bars into the external meatus; then the animal was mounted on a stereotaxic apparatus. Each bulla was vented with ∼30 cm of polyethylene tubing to maintain equilibrium of middle ear pressure. Tissue was dissected from the posterior skull, which was then opened to expose the cerebellum. The cerebellum was partially aspirated to visualize the posterior surface of the IC. All procedures were approved by the Animal Care Committee of Massachusetts Eye and Ear.
Single-Unit Recording
Well-isolated single-unit recordings from the left IC were made with parylene-coated tungsten microelectrodes (Micro Probe, Potomac, MD) connected to the head stage of an Axoprobe amplifier (Molecular Devices, Sunnyvale, CA). Action potentials were detected with a Schmitt trigger, and occurrence times were recorded with 1-µs resolution by a custom-built event timer. Diotic broadband noise bursts were used as search stimuli (200-ms duration, presented once per second).
Stimuli
All stimulus waveforms were created in MATLAB (The MathWorks, Natick, MA) with a 20-kHz sampling rate, converted to analog signals (Concurrent DA04H, 16 bits), and delivered to each ear via calibrated headphones (Realistic 40–1377; RadioShack, Ft. Worth, TX) attached to the hollow ear bars. Custom-built attenuators controlled the sound pressure level (SPL) with 0.1-dB resolution.
ITD tuning.
Neural sensitivity to ITD was assessed by systematically varying the ITD of a broadband noise stimulus (10-kHz bandwidth) to obtain a rate-ITD function. A continuous stimulus was created by concatenating 200-ms segments of noise that first increased and then decreased in ITD between −2,000 µs (ipsilateral ear leading) and+2000 µs (contralateral ear leading) in 200-µs increments or from −3,000 µs to +3,000 µs in 300-µs increments. The entire 8.4-s stimulus (i.e., once up and once down in ITD) was repeated 10 times with no interruption. The stimulus level was almost always 50 dB SPL. A lower (≥40 dB SPL) or higher (≤70 dB SPL) level was used in a few cases when the firing rate was either saturated or too low, respectively. Each neuron was characterized by its best ITD (the ITD producing the largest firing rate) and its BFITD (a measure of frequency tuning derived from the quasiperiodicity of the rate-ITD curve, see Data Analysis). The worst ITD was defined as the location of the local minimum in firing rate closest to zero ITD.
Dichotic pitch stimuli.
Dichotic pitch stimuli (HP and BEP; Fig. 1) were synthesized by filtering broadband noise (10-kHz bandwidth) in the frequency domain to alter the interaural phase over a narrow frequency band centered on a boundary frequency, FB. The width of the transition band was set to 8% of FB because this bandwidth produces a strong Huggins pitch percept in human listeners (36). For BEP stimuli (Fig. 1A), the interaural phase transitioned either from 0 below the transition band to π radians (antiphasic) above the transition band (BEP+/−) or from π radians below the transition band to 0 above the transition band (BEP−/+). The original HP stimulus (1) had a gradual phase change through 2π radians over the transition band. We used a simpler variant, sometimes called “binaural band pitch” (Fig. 1B), in which the interaural phase is constant over the transition band and differs by π radians from the phase outside the transition band (37–39). Specifically, the interaural phase was either 0 within the transition band and π radians outside of the transition band (HP+) or π radians within the transition band and 0 everywhere else (HP−). Psychophysical studies have shown that the variant of HP stimuli with constant phase over the transition band produces essentially equivalent percepts as the original HP with gradual phase transition (23, 38).
We synthesized continuous HP or BEP stimuli in which FB varied periodically around the neuron’s BFITD to characterize the coding of dichotic pitch by individual neurons. This choice of a dynamic stimulus was motivated by prior use of binaural beats in which the interaural phase varies continuously to efficiently characterize neural ITD sensitivity (40) and use of harmonic complex tones with continuously varying fundamental frequency to characterize the neural coding of monaural pitch (41). Specifically, continuous dichotic pitch stimuli were constructed by concatenating 200-ms “notes” such that FB increased from 100 Hz to 1,000 Hz and then decreased back to 100 Hz, in steps of 100 Hz (4 s total). The notes were overlapped by 10 ms to avoid perceptual artifacts and ramped linearly over the overlap region to maintain constant power throughout. Additional stimuli were created with frequency ranges of 500–1,500 Hz and 1,000–2,000 Hz. Each neuron was tested with one or more of these stimuli selected so that FB would span the neuron’s BFITD. The 4-s stimulus was presented 10 times with no interruption, usually at a stimulus level of 50 dB SPL. To maximize the expected effect of varying FB on firing rates, an ITD equal to the neuron’s best ITD was applied to the entire sequence of dichotic pitch stimuli. The ITD was imposed upon the entire waveform, including the transition band and the noise carrier outside the transition band. In humans, such imposed delays do not alter the dichotic pitch frequency or pitch strength, at least for ITDs ≤ 1.5 ms (42–44). The measurement was typically then repeated by imposing an interaural delay equal to the worst ITD. In a few cases (n = 20), the ITD was systematically varied over a wider range (typically, −2,000 µs to +2,000 µs in 400-µs steps) to obtain a two-dimensional (2-D) array of firing rates versus FB and ITD. Usually, responses were collected in the following order: HP+, HP−, BEP+/−, BEP−/+.
Data Analysis
Rate functions.
The first step in the analysis was to compute neural firing rates as a function of FB for the dichotic pitch stimuli and as a function of ITD for the rate-ITD curves. Spike counts were averaged over each 200-ms segment of the stimulus. Every value of FB or ITD was presented twice, once each on the ascending and descending limbs of the stimulus; firing rates were averaged over both instances and all 10 repetitions of the sequence. Standard deviations of the firing rates were calculated by treating all 20 instances (10 up, 10 down) as statistically independent.
Estimation of BFITD with cross-correlation model.
Rate-ITD curves of low-frequency IC neurons for broadband noise stimuli show a quasiperiodicity that reflects cochlear frequency selectivity (45, 46). For each neuron, we used a model of ITD processing based on interaural cross-correlation (35) to fit the rate-ITD curve and estimate a measure of frequency tuning called BFITD. Because dichotic pitch stimuli are broadband, the BFITD measure derived from responses to binaural broadband noise is likely to be more useful in characterizing frequency selectivity for dichotic pitch stimuli than measures based on responses to pure tones.
Briefly, the cross-correlation model for each IC neuron has five parameters (BFITD, τ0, CD, A, B). [The model version used by Hancock and Delgutte (35) had a sixth parameter, the characteristic phase, which was set to 0 for the present analyses]. Frequency tuning is imposed by identical gammatone filters (47) at the two ears, with center frequency BFITD and time constant τ0, which is inversely related to the bandwidth. An internal characteristic delay, CD, is applied to the contralateral side. The interaural cross-correlation, ρ, between the filtered and delayed signals is then computed over the entire stimulus duration and converted to a firing rate, R, by means of a quadratic function:
| (1) |
The model was fit to single-unit data with the lsqnonlin function in MATLAB. Figures 2, A and B, and 4, A and B, show rate-ITD curves of four IC neurons and the corresponding best-fit curves for the cross-correlation models. The quality of these fits is representative: on average, the cross-correlation model accounted for 93% of the variance in the rate-ITD curves (35).
Figure 2.
Responses of two inferior colliculus (IC) neurons to binaural edge pitch (BEP) stimuli. A and B: rate-interaural time difference (ITD) curves for broadband noise (blue circles) of both neurons are quasiperiodic at best frequency (BF) estimated as a function of ITD (BFITD). The values of BFITD (A: 772 Hz, B: 590 Hz) were estimated from best fits of the cross-correlation model to the rate-ITD curves (blue line). The best and worst ITDs are marked by upward and downward triangles, respectively (A: best ITD = 290 µs, worst ITD = −200 µs; B: best ITD = 410 µs, worst ITD = −500 µs). C and D: responses of the two neurons as a function of boundary frequency (FB) to BEP+/− (dashed lines) and BEP−/+ (solid lines) stimuli presented at the best ITD (upward triangles) and the worst ITD (downward triangles). Consistent with predictions of the cross-correlation model (Fig. 1 and Table 2), firing rates show a falling edge around BFITD for BEP−/+ at the best ITD and BEP+/− at the worst ITD (orange curves). In contrast, firing rates show a rising edge around BFITD for BEP−/+ at the worst ITD and BEP+/− at the best ITD (green curves).
Figure 4.
Responses of two inferior colliculus (IC) neurons to Huggins pitch (HP) stimuli. A and B: broadband noise rate-interaural time difference (ITD) curves (blue circles) are quasiperiodic at best frequency estimated as function of ITD (BFITD). Values of BFITD (A: 749 Hz, B: 472 Hz) estimated from best fits of the cross-correlation model (blue line). The best and worst ITDs are marked by upward and downward triangles, respectively (A: best ITD = 450 µs, worst ITD = −300 µs; B: best ITD = −800 µs, worst ITD = 400 µs). C and D. responses of the two neurons to HP+ (dashed lines) and HP− (solid lines) stimuli presented at the best ITD (upward triangles) and the worst ITD (downward triangles). Consistent with predictions of the cross-correlation model, firing rates exhibit a trough when boundary frequency (FB) is near BFITD for HP− at the best ITD and HP+ at the worst ITD (orange curves). Firing rates peak when FB is near BFITD for the complementary stimulus configurations (green curves).
Summary metrics of responses to dichotic pitch stimuli.
For each neuron, summary metrics of responses to dichotic pitch stimuli were obtained from smooth curves fit to the rate vs. FB data. All curve fits were performed with the lsqcurvefit function in MATLAB. BEP responses were fit with a sigmoid:
| (2) |
where R is the firing rate, FB is the boundary frequency, and a, b, c, and d are parameters fit to the data. Responses were characterized by the value of FB corresponding to the steepest change in firing rate (the “edge” frequency, Fedge) and the signed value of the slope at that point Sedge (see Fig. 3A).
Figure 3.

Summary of inferior colliculus (IC) neuron responses to binaural edge pitch (BEP) stimuli. Data from 64 measurement runs in 22 neurons. A: for each neuron, profiles of firing rates vs. boundary frequency (FB) were fit by sigmoidal curves from which 2 parameters were extracted: the “edge” frequency at which the slope has the largest absolute value (Fedge) and the signed value of the slope at that point (Sedge). (Same neuron as shown in Fig. 2, B and D.) B: the sign of the slope is consistent with the predictions of the cross-correlation model for various combinations of stimulus phase configuration and imposed interaural time difference (ITD) (Table 2). BD, best ITD; WD, worst ITD. C: the frequency of the edge in firing rate is highly correlated with best frequency estimated as function of ITD (BFITD; estimated from fit of cross-correlation model to neuron’s broadband noise rate-ITD curve). D: maximum absolute edge slope decreases with increasing bandwidth (BW) of cross-correlation model filter. E: distribution of neural just noticeable differences (JNDs) in edge pitch frequency, plotted separately for each stimulus condition (same key as in B). Distributions are not significantly different. F: the neural JNDs for BEP frequency tend to increase as BFITD increases.
Responses to HP stimuli were fit with a Gaussian function using the MATLAB lsqcurvefit function:
| (3) |
and characterized by an “extremum frequency,” Fextr, corresponding to the mean value of the Gaussian (i.e., the parameter b), and a “rate change at the extremum,” Rextr, the signed amplitude of the Gaussian at that point (i.e., a) (see Fig. 5A).
Figure 5.

Summary of neural data for Huggins pitch (HP) stimuli. Data from 162 measurement runs from 63 units. A: profiles of firing rate vs. boundary frequency (FB) were fit to Gaussian curves from which 2 parameters were extracted: the extremum frequency (Fextr, mean of the Gaussian) and the signed rate change (Rextr, the signed amplitude of the Gaussian at that point). B: the type of extremum (as coded by the sign of Rextr) is generally consistent with predictions of the cross-correlation model for various combinations of stimulus phase configuration and imposed interaural time difference (ITD). BD, best ITD; WD, worst ITD: C: Fextr is positively correlated with best frequency (BF) estimated as function of ITD (BFITD). D: absolute value of Rextr decreases with increasing bandwidth (BW) of the cochlear filter in the cross-correlation model. E: distribution of neural just noticeable differences (JNDs) for HP frequency estimated from Gaussian interpolations, plotted separately for stimulus phase configurations and ITDs predicted to produce rate troughs (red) and peaks (green) slopes. F: neural JNDs for HP frequency tend to increase with BFITD.
The curve fits were also used to compute a just noticeable difference (JND) in boundary frequency FB for each neuron. The firing rate-based discriminability, D′, of two boundary frequencies f and f + Δf is given by
| (4) |
where the curve fits just described provide the values for the mean firing rate R(f). To obtain the variance in firing rate across stimulus trials a power law relationship between the firing rate and its variance was assumed:
| (5) |
where the parameters α and k were fit individually to each data set from each neuron. The curve fits were sampled with 1-Hz resolution along the f-axis. Then, for each value of f, the smallest value of Δf was found for which D′(f, Δf) > 1. The neural JND was taken as the smallest value of Δf across all values of f.
Statistical methods.
All statistical analyses were performed with the MATLAB Statistics and Machine Learning toolbox (version 2011 b; The MathWorks, Natick, MA).
Neural Population Model for Huggins Pitch Detection
A neural population model of ITD processing was used to relate single-neuron physiology to dichotic pitch perception. The model, described in detail in Hancock and Delgutte (35), comprises a 2-D grid of model neurons, each of which is an implementation of the cross-correlation model described above. We assume that the most functionally important differences between ITD-sensitive neurons are their frequency and ITD tuning and hence distribute these properties independently across the grid. Frequency tuning, represented by the parameter BFITD, systematically varies across the grid rows (25 Hz to 1,200 Hz in 25-Hz steps). ITD tuning, represented by the best IPD (equal to the product BFITD × best ITD), varies across the columns (−0.5 cycles to +1.0 cycles in 0.025-cycle steps), representing the IC on one side. The distribution is specified in terms of best IPD because that quantity is more nearly independent of BFITD than best ITD is (35, 48). The best IPD parameter is used to set the model neuron’s characteristic delay (CD = best IPD/BFITD). The remaining model parameters, governing filter bandwidth and the dependence of firing rate on interaural correlation, are all set to empirically determined constants (Table 1).
Table 1.
Parameters of the neural population model of ITD processing
| Model Component | Parameters | Values |
|---|---|---|
| Frequency tuning | BFITD | 25–1,200 Hz |
| Q = τ0BFITD | 0.3 | |
| Internal delay | Best IPD = CD × BFITD | −0.5 to + 1 cycle |
| CP | 0 | |
| IACC to rate mapping | A | 30 |
| B | 1 | |
| Firing rate variability | α = variance/mean rate | 0.8 |
| Distribution weights | BFITD | lognormal (6.50, 0.514) |
| Best IPD | normal (0.183, 0.143) |
BF, best frequency; BFITD, BF estimated as function of interaural time difference (ITD); CD, characteristic delay; CP, characteristic phase; IACC, interaural cross-correlation; IPD, interaural phase difference; Q, quality factor of gammatone filter; τ0, time constant of gammatone filter.
The inputs to the model are similar to the stimuli used in the experiment (HP and BEP stimuli with varying FB and broadband noise varied in ITD). The outputs are the firing rates of the model neurons in one IC displayed as heat maps as a function of BFITD and best IPD (see Fig. 7). We only explicitly modeled the neural population in one of the two ICs. Since all the stimuli presented to the model produce symmetric patterns of activity across the midline, the trends in predicted psychophysical performance would be identical if we also modeled the other IC.
Figure 7.
Neural population model of Huggins pitch (HP) detection. A: heat maps showing firing rates of a population of model neurons as a function of best frequency (BF) and best interaural phase difference (IPD). Model responses are shown for HP+ (left) and HP− (right) stimuli at two different boundary frequencies (top: FB = 500 Hz; bottom: FB = 200 Hz). B: maps of signed discriminability (D′) for HP detection computed by subtracting population responses to a broadband noise carrier (antiphasic noise for HP+, diotic noise for HP−) from the heat maps in A. Curves in the margins of the top right map show the distributions of BF and best IPD based on measurements from the inferior colliculus (IC) of anesthetized cats [Hancock and Delgutte (35)]. C: maps of signed D′ from B, weighted by both BF and best IPD distributions. Arrows point to regions of positive and negative D′ for the 200-Hz HP stimuli. D–G: model performance (percent correct for distinguishing an HP stimulus from broadband noise) computed by optimally combining D′ values across the entire neural population. Red curves show performance for HP−, blue curves for HP+. For each model version in D–F, the detection efficiency ϵ (the sole free parameter of the model) was chosen to make percent correct = 99% for HP− at 300 Hz. D: unweighted model shows little variation in performance between HP− and HP+ or with changes in FB. E: application of best IPD weighting alone yields better detection for HP− compared with HP+ for low FB. F: application of BF weighting alone yields markedly decreased detection performance for low FB but no difference between HP+ and HP−. G: joint application of BF and best IPD weights predicts both better detection of HP− and overall declining performance with decreasing FB, in qualitative agreement with human psychophysical data of Hartmann and Zhang (36) shown in H.
Huggins pitch detection is measured psychophysically by asking subjects either to discriminate HP− stimuli from diotic (No) broadband noise (BBN) or to discriminate HP+ stimuli from interaurally antiphasic (Nπ) BBN (36). We simulated this task in the model by first computing, for each neuron, the signed discriminability D′ between an HP stimulus and the corresponding BBN “carrier.” For the neuron at position (i, j) in the IPD-BFITD grid:
| (6) |
The spike count variances and were assumed to be proportional to the respective mean spike counts RHP and RBBN, with a factor of 0.8 that was determined empirically by fits to a large sample of IC neurons (35).
The empirically measured distributions of BFITD and best IPD were used to weigh the individual values according to the BFITD and best IPD of each model neuron. These empirical distributions were derived from the responses of 197 neurons in the IC of anesthetized cats recorded for several studies in our laboratory (33, 35, 49). A population D′ was then computed by optimally combining the weighted D′ values across all cells in the 2-D map (50):
| (7) |
Note that this combination rule assumes that the firing rates of model neurons are conditionally statistically independent. Percent correct detection was computed from the population D′, using the inverse of the standard normal distribution. The “detection efficiency” ϵ is the only free parameter in the model and was selected to produce 99% correct detection for HP− at FB = 300 Hz, to anchor the predictions to a point in the data of Hartmann and Zhang (36). In alternate versions of the model, the BFITD and best IPD weights were separately replaced by uniform distributions to explore their effects on model performance.
Model for estimation of Huggins pitch frequency.
To derive an estimate of the perceived pitch frequency based on responses of the neural population model to HP stimuli, additional constraints were imposed on how the information available in the 2-D map of model activity is read out. First, for each BF, a marginal firing rate RM was obtained by summing model firing rates along the best IPD axis after weighting by the best IPD distribution:
| (8) |
The variance of the marginal firing rate was derived by assuming that firings were statistically independent across neurons:
| (9) |
The same was done for the firing rates in response to the noise carrier, yielding the marginal rate RMBBN and its variance . Second, the marginal firing rates for the HP stimulus were subtracted from those for the noise carrier to yield a marginal D′ that was also weighted by the BF distribution:
| (10) |
The estimated pitch frequency was defined as the location of the maximum D′M along the best frequency axis.
RESULTS
We measured responses of ITD-sensitive IC neurons in anesthetized cats to dichotic pitch stimuli (HP or BEP) in which the boundary frequency FB was varied around each neuron’s best frequency (BFITD) estimated from responses to broadband noise as a function of ITD. Because determination of BFITD required oscillations in the rate-ITD curves characteristic of ITD sensitivity to the temporal fine structure (46), the range of BFITD was limited to low frequencies (256–1,720 Hz). Responses to BEP stimuli were obtained in 64 measurement runs from 22 IC neurons. Measurement runs from the same neuron differed in either the interaural phase configuration (BEP+/− or BEP−/+) or the imposed ITD (best or worst). Responses to HP stimuli were obtained in 172 measurement runs from 73 neurons, where the runs differed in either phase configuration (HP+ or HP−) or imposed ITD.
In this section, we first describe responses to BEP stimuli and then responses to HP stimuli. We show that these responses are qualitatively consistent with predictions of a generic cross-correlation model of ITD processing. We then show that a population model of ITD-sensitive IC neurons with physiologically realistic distributions of internal delays and BFITD predicts trends in human psychophysical data for the detection of HP stimuli. But first, we state the predictions of the cross-correlation model for BEP and HP stimuli.
Predictions of the Cross-Correlation Model for Dichotic Pitch Stimuli
Predictions for BEP stimuli.
Figure 1C illustrates the predictions of the cross-correlation model of ITD processing for how rate responses to BEP+/− (Fig. 1C, left) and BEP−/+ (Fig. 1C, right) stimuli change as the boundary frequency FB is varied around the best frequency (BFITD) of an ITD-sensitive neuron. These qualitative predictions are not tied to any specific implementation of the cross-correlation model so long as three fundamental assumptions hold: 1) neurons exhibit frequency selectivity so that their responses are primarily determined by frequency components of the ear input signals near their BFITD; 2) each neuron exhibits an internal delay corresponding to its best ITD for broadband noise (with opposite sign); and 3) the neuron’s firing rate is a monotonically increasing function of the “effective” interaural cross-correlation (IACC) after taking into account the frequency selectivity and the internal delay.
Figure 1C, top, shows the frequency tuning curve (represented by a Gaussian centered at BFITD) of a hypothetical neuron along with the interaural phase spectra of BEP+/− (Fig. 1C, top left) and BEP−/+ (Fig. 1C, top right) stimuli for three values of FB spanning the neuron’s BFITD. In Fig. 1C, a and b, an interaural delay equal to the neuron’s best ITD was imposed on the BEP stimuli to compensate for the internal delay and thereby maximize the effect of manipulating FB on the firing rate (24, 45). For a BEP+/− stimulus with FB ≪ BFITD (Fig. 1C, left arrow), the interaural phase is π in the vicinity of BFITD so that the effective interaural correlation is near −1 and the firing rate is expected to be low. In contrast, for FB ≫ BFITD (Fig. 1C, right arrow), the interaural phase near BFITD is 0, the effective interaural correlation is near +1, and the firing rate is expected to be high. When FB ≈ BFITD (Fig. 1C, center arrow), the neuron is driven by a mixture of homophasic and antiphasic inputs, so the firing rate is expected to be intermediate. Thus, the predicted profile of firing rate as a function of FB (Fig. 1Ca) shows a rising edge near BFITD for a BEP+/− stimulus with an imposed delay equal to the best ITD. By mirror image symmetry in the interaural phase spectrum, the opposite pattern, i.e., a falling edge in firing rate, is predicted for a BEP−/+ stimulus presented at the best ITD (Fig. 1Cb).
Because cochlear filtering transforms the broadband noise inputs into narrowband noises with a quasiperiodicity at BFITD, imposing an interaural delay equal to the worst ITD (where the firing rate is minimum) on BEP stimuli is approximately equivalent to introducing a π phase shift relative to the case when the BEP stimulus is presented at the best ITD. Therefore, the cross-correlation model predicts the opposite pattern for BEP stimuli presented at the worst ITD (Fig. 1, Cc and Cd) from predictions for stimuli presented at the best ITD: a falling edge in firing rate is predicted for BEP+/− stimuli, and a rising edge is predicted for BEP−/+.
To summarize, the cross-correlation model makes two key predictions for the dependence of firing rate on FB for BEP stimuli: 1) An edge in firing rate is predicted when FB ≈ BFITD. 2) The edge direction (rising vs. falling) depends on both the interaural phase configuration (BEP+/− vs. BEP−/+) and the imposed ITD (best vs. worst). Specifically, rising edges (green in Fig. 1C and subsequent figures) are predicted for BEP+/− at the best ITD and BEP−/+ at the worst ITD. Falling edges (red in Fig. 1C) are predicted for BEP−/+ at the best ITD and BEP+/− at the worst ITD. These predictions are summarized in Table 2.
Table 2.
Predicted features in rate-FB profiles for various types of dichotic pitch stimuli
| Stimulus Type | Imposed ITD | Predicted Feature for FB ≈ BFITD |
|---|---|---|
| BEP+/− | Best ITD | Rising edge |
| Worst ITD | Falling edge | |
| BEP−/+ | Best ITD | Falling edge |
| Worst ITD | Rising edge | |
| HP+ | Best ITD | Peak |
| Worst ITD | Trough | |
| HP− | Best ITD | Trough |
| Worst ITD | Peak |
BEP, binaural edge pitch; BF, best frequency; BFITD, BF estimated as function of interaural time difference (ITD); FB, boundary frequency; HP, Huggins pitch.
Predictions for HP stimuli.
Predictions of the cross-correlation model for how responses of IC neurons to HP stimuli vary as a function of FB (Fig. 1D) follow from the same three fundamental assumptions (frequency selectivity, internal delay, and monotonically increasing rate-IACC relationship) as predictions for BEP stimuli. When FB of an HP− stimulus presented at the best ITD is either well above or well below the neuron’s BFITD (Fig. 1Db), the inputs to the binaural neuron arrive in phase, the effective interaural correlation is near +1, and the firing rate is predicted to be high. When FB ≈ BFITD, the inputs become partially decorrelated because of the mixture of homophasic and antiphasic components within the filter passband, so the firing rate is expected to be lower, resulting in a trough in the profile of firing rate against FB. By symmetry, the opposite pattern (i.e., a peak in firing rate) is expected for an HP+ stimulus presented at the best ITD (Fig. 1Da). Presenting the HP stimuli at the worst ITD is approximately equivalent to an overall π interaural phase shift relative to the case when stimuli are presented at the best ITD, resulting in a reversal of the predicted firing rate patterns, i.e., peaks become troughs and vice versa (Fig. 1, Dc and Dd).
In summary, the cross-correlation model makes two predictions for the dependence of firing rate on FB for HP stimuli (Table 2): 1) An extremum in firing rate (peak or trough) is predicted when FB ≈ BFITD. 2) The type of extremum (peak vs. trough) depends on both the interaural phase configuration (HP+ vs. HP−) and the imposed ITD (best vs. worst). Specifically, peaks (green in Fig. 1D and subsequent figures) are predicted for HP+ stimuli at the best ITD and HP− at the worst ITD. Troughs (red in Fig. 1D) are predicted for HP+ at the worst ITD and HP− at the best ITD.
Neural Responses to BEP Stimuli Fit Predictions of the Cross-Correlation Model
Figure 2 shows the responses of two IC neurons to BEP stimuli with varying FB. The BFITDs estimated from fits of the cross-correlation model to rate-ITD curves for broadband noise (Fig. 2, A and B) were 772 and 590 Hz, respectively. Both neurons showed peak-type ITD tuning with best ITDs of +290 and +410 μs, respectively. Consistent with predictions of the cross-correlation model (Table 2), the firing rates of both neurons showed a falling edge when FB ≈ BFITD for BEP−/+ stimuli presented at the best ITD and BEP+/− stimuli at the worst ITD (Fig. 2, C and D). A rising edge was observed for BEP+/− stimuli at the best ITD and BEP−/+ at the worst ITD. The rising edge was not as prominent for the neuron of Fig. 1C because its BFITD was near the upper end of the FB range and because the rate-IACC function derived from the model fit was very expansive, resulting in the firing rate being almost 0 over a wide range of FB. Nevertheless, the overall pattern of results is consistent with predictions of the cross-correlation model for both neurons.
For each neuron and stimulus condition, a sigmoid curve was fit to the rate-FB profile (Fig. 3A), from which two features were extracted: the edge frequency, Fedge, where the firing rate changed most rapidly with FB, and the signed slope at the edge, Sedge, where the sign determines whether the edge is rising or falling. Figure 3C shows that, across our sample of 64 measurements from 22 neurons, the edge frequency was highly correlated with BFITD [regression slope 0.898, r2 = 0.656, F(1,52) = 99.4, P < 0.0001], as predicted by the cross-correlation model. Moreover, for BEP−/+ stimuli, the observed edge direction (rising or falling) was always as predicted by the cross-correlation model (Fig. 3B and Table 2): rising at worst ITD and falling at best ITD. For BEP+/− stimuli, the edge direction was as predicted by the model in all but one case: 22/23 cases with rising edge for BEP+/− at best ITD and 15/16 cases with falling edge for BEP+/− at the worst ITD (Table 3). To assess the statistical significance of these results, we tested the null hypothesis that the edges were equally likely to occur in either direction. We computed the probability that the number of cases consistent with predictions would be at least as large as observed, assuming binomial distributions. The results of these tests (Table 3) led to rejection of the null hypothesis with high confidence (P = 0.001 or lower) for each of the four stimulus conditions as well as for the combined data across conditions.
Table 3.
Comparison of direction of edges in firing rate for BEP stimuli with cross-correlation model predictions
| Condition | BEP+/− @ BD | BEP+/− @ WD | BEP−/+ @ BD | BEP−/+ @ WD | Combined |
|---|---|---|---|---|---|
| Prediction | Rising | Falling | Falling | Rising | N/A |
| No. as predicted | 22 | 15 | 15 | 10 | 62 |
| No. cases | 23 | 16 | 15 | 10 | 64 |
| Binomial prob | <0.0001 | 0.0003 | <0.0001 | 0.0010 | <0.0001 |
BD, best interaural time difference (ITD); BEP, binaural edge pitch; N/A, not applicable; WD, worst ITD.
There were no significant differences between stimulus conditions in the median absolute values of the edge slopes (Kruskal–Wallis test, χ2= 2.83, df =3, P = 0.418, n = 64). However, the absolute value of the edge slope was inversely correlated with the cochlear filter bandwidth parameter of the cross-correlation model across the measurement sample [Fig. 3D; r2 = 0.157, F(1,62) = 11.6, P = 0.0012]. This result is expected, since neurons with wider bandwidths will see more gradual transitions in effective interaural correlation as FB crosses the BFITD and therefore shallower edges in firing rate. Thus, responses of our IC neurons to BEP stimuli are consistent with predictions of the cross-correlation model in every respect.
For each neuron and each stimulus condition, we used signal detection theory to compute neural just noticeable differences (JNDs) in boundary frequency FB based on firing rate (see methods) that can be compared with psychophysical data on FB discrimination for BEP stimuli (8) as well as variability in pitch matching data for BEP (2, 51). Our JND metric was evaluated for the reference FB that yielded the lowest (best) JND and therefore made no assumption about the BFITD of neurons that contribute to the discrimination. Neural JNDs varied widely across the neuronal sample from 2 to >300 Hz, with a median of 22 Hz (Fig. 3, E and F). Median neural JNDS did not significantly differ between the four stimulus conditions (BEP+/− or BEP−/+ at best or worst ITD) (Kruskal–Wallis test, χ2 = 2.46, df = 3, P = 0.483), showing that 1) FBs of BEP+/− and BEP−/+ stimuli are encoded equally well and 2) FB can be equally well encoded by rising and falling edges in firing rate. Across the measurement sample, there was a significant trend for neural JNDs to increase with BFITD [Fig. 3F; r2 = 0.123, F(1,62) = 8.7, P = 0.0044]. However, this trend vanished (r2 = 0.0058, P = 0.551) when the JNDs were expressed as a percentage of the geometric mean of the two bounding frequencies FB and FB + JND. These normalized JNDs ranged from 0.24% to 86%, with a median of 3.5%.
In summary, the boundary frequency FB of BEP stimuli is encoded by edges in the firing rate of individual IC neurons (as quantified by the neural JNDs) when FB crosses the neuron’s BFITD. Since the central nucleus IC is tonotopically organized (52, 53), we infer that the edges observed in rate responses of individual neurons will map onto corresponding edges (with opposite signs) along the tonotopic axis of the IC for a BEP stimulus with a given FB, thereby providing a rate-place code for FB. This code is qualitatively consistent with predictions of cross-correlation models of ITD processing incorporating frequency tuning and an internal delay.
Neural Responses to HP Stimuli Fit Predictions of the Cross-Correlation Model
Figure 4 shows the responses of two example IC neurons (BFITD of 749 and 472 Hz, respectively) to HP stimuli with varying FB. Both neurons showed peak-type ITD tuning with best ITDs of +450 and −800 μs, respectively (Fig. 4, A and B). Consistent with predictions of the cross-correlation model (Table 2), the firing rates of both neurons showed a trough in firing rate when FB ≈ BFITD for HP− stimuli presented at the best ITD and HP+ stimuli at the worst ITD (Fig. 4, C and D). Both neurons also showed a small peak in firing rate for HP+ stimuli at the best ITD and HP− stimuli at the worst ITD. The peaks in Fig. 4C are not very prominent, especially for HP−, because the firing rates were very low for all values of FB. Such low firing rates were frequently observed across our neuronal sample, perhaps because the neurons’ firing rates were strongly adapted because of the continuous nature of the stimulation. Nevertheless, the overall pattern of results is consistent with predictions of the cross-correlation model for both neurons.
For each neuron and each stimulus condition, a Gaussian curve was fit to the rate-FB profile to extract two metrics: the extremum frequency, Fextr (peak or trough), and the signed rate change at the extremum, Rextr, representing the extremum prominence relative to the baseline (see methods). The sign of Rextr specifies whether the extremum is a peak (positive) or a trough (negative). The cross-correlation model predicts that Fextr should be near BFITD and that the sign of Rextr should be as specified in Table 2. Consistent with the first prediction, the extremum frequency was highly correlated with BFITD [Fig. 5C; r2 = 0.172, F(1,130) = 26.9, P < 0.0001], although the scatter around the line of equality was higher and the slope of the regression line (0.606) deviated further from unity than in the BEP data of Fig. 3C.
In a majority of cases, the sign of the rate change at extremum Rextr (Fig. 5B) was consistent with predictions of the cross-correlation model: positive (peak) for HP+ at the best ITD and HP− at the worst ITD, negative (trough) for HP− at the best ITD and HP+ at the worst ITD. However, statistical analysis using binomial tests on Rextr sign with the null hypothesis that peaks and troughs are equiprobable revealed a difference between peaks and troughs (Table 4): Whereas in conditions when a peak was predicted the observed proportion of peaks was significantly better than chance (P < 0.0001), this was not the case in conditions in which a trough was predicted (P = 0.292 and 0.218 for HP+ at worst ITD and HP− at best ITD, respectively). Nevertheless, when data from all four conditions were combined, the binomial test showed that the observed number of peaks and troughs that were consistent with model predictions was well above chance (P < 0.0001), offering overall support for the cross-correlation model. Across the 172 measurement runs, the absolute value of Rextr was inversely correlated with the filter bandwidth parameter of the cross-correlation model [Fig. 5D; r2 = 0.096, F(1,169) = 16.6, P < 0.0001]. This result is expected because neurons with sharper tuning will see a greater contrast in interaural correlation between the conditions when BFITD is located inside versus outside the transition band.
Table 4.
Comparison of direction of extrema in firing rate for HP stimuli with cross-correlation model predictions
| Condition | HP+ @ BD | HP+ @ WD | HP− @ BD | HP− @ WD | Combined |
|---|---|---|---|---|---|
| Prediction | Peak | Trough | Trough | Peak | N/A |
| No. as predicted | 40 | 17 | 33 | 34 | 124 |
| No. cases | 43 | 30 | 59 | 30 | 171 |
| Binomial prob | <0.0001 | 0.292 | 0.218 | <0.0001 | <0.0001 |
BD, best interaural time difference (ITD); HP, Huggins pitch; N/A, not applicable; WD, worst ITD.
Neural JNDs for HP boundary frequency FB were computed for each unit and HP condition in the same manner as for BEP stimuli. Neural JNDs varied widely across runs, from ∼16 Hz to 256 Hz, with a median of 65 Hz (Fig. 5E). No JND could be determined in 30/172 measurement runs because D′ was <1 for all possible pairs of FB. There was no significant difference in median JNDs between the four HP conditions (Kruskal–Wallis test, χ2= 1.84, df = 3, P = 0.606), meaning that the HP boundary frequency is coded equally well for HP+ and HP− stimuli, and by peaks and troughs in firing rate. As was the case for BEP frequency, there was a positive correlation between neural JNDs and BFITD due to the lack of small JNDs for BFITD > 1,000 Hz [Fig. 5F, r2 = 0.113, F(1,170) = 16.5, P < 0.0001]. However, this trend did not hold when the JND was expressed as a percentage or Weber fraction (r2 = 0.0117, P = 0.212). Normalized JNDs varied from 1.8% to 56%, with a median of 9.4%.
In summary, the boundary frequency FB of HP stimuli is encoded by extrema in firing rate of both individual IC neurons and, by inference, across the tonotopic pattern of neural activity (since the extrema occur near BFITD). Although this rate-place code is qualitatively consistent with predictions of the cross-correlation model of ITD processing, the agreement with the model is not as clear-cut as for BEP stimuli and the code is also less robust, as evidenced by the higher neural JNDs for HP frequency compared with BEP.
Periodicity of HP-Rate-ITD Curves Depends on Boundary Frequency
In some neurons, we measured responses to HP stimuli over a dense 2-D grid of FB and ITD, allowing us to determine how the periodicity in rate-ITD curves for HP stimuli (henceforth called “HP-rate-ITD curves”), which reflects cochlear frequency selectivity, may depend on FB. Such dependence is expected from the cross-correlation model. Consider a band-pass filter representing cochlear tuning excited by broadband noise. The filter output is a narrowband noise centered at the filter’s center frequency BF. If a narrow band of noise centered at FB is removed from the input signal, the spectral center of gravity of the filter output will shift toward the opposite side of FB in relation to the BF, and the amount of shift will be greater if FB is closer to the BF. For a binaural neuron in which firing rate is an increasing function of interaural correlation, decorrelating the ear input signals over a narrow band centered at FB, as occurs with HP stimuli, is akin to removing a narrow band from the input signals and is thus expected to produce shifts in the apparent frequency tuning inferred from the HP-rate-ITD curve.
Figure 6A shows HP-rate-ITD curves of an IC neuron for HP+ stimuli with 10 different values of FB spanning the BFITD. The neuron’s BFITD determined by fitting the cross-correlation model to the rate-ITD curve for broadband noise (not shown) was 749 Hz. The quasiperiodicity in the HP-rate-ITD curves depends on FB, showing the shortest period for FB = 700 Hz, just below BFITD (cyan line in Fig. 6A) and the longest period for FB = 800 Hz, just above BFITD (dark blue curve in Fig. 6A). Figure 6B shows the HP-rate-ITD curves of the cross-correlation model with parameters fit to the broadband noise ITD curve for the same 10 HP+ stimuli as in Fig. 6A. Similar FB-dependent shifts in periodicity of the HP-rate-ITD curves are observed in the model response as in the data of Fig. 6A, showing that such shifts are an inherent property of the cross-correlation model.
Figure 6.
Periodicity of neural rate-interaural time difference (ITD) curves for Huggins pitch (HP) stimuli depends on dichotic pitch frequency. A: HP-rate-ITD curves for different values of HP+ boundary frequency, FB, for an example neuron [best frequency (BF) estimated as function of ITD (BFITD) = 749 Hz]. B: the cross-correlation model was fit to the rate-ITD curve obtained with broadband noise (i.e., no dichotic pitch, not shown) and then used to predict HP-rate-ITD curves for the same HP+ boundary frequencies as in A. The curves in A and B are color-coded by frequency as the data points in C. The ITD axis was sampled more finely for model predictions than in the neural data (50-µs vs. 300-µs resolution). The periodicity of the model HP-rate-ITD curves shows a similar dependence on FB as for the data in A. C: to quantify the periodicity of the rate-ITD curves for each FB, the cross-correlation model was fit to each HP-rate-ITD curve in A and B, holding all parameters fixed except the BF, whose best-fitting value was taken as the “dominant frequency,” BFFB, for that boundary frequency. Circles, data; squares, model. D: dominant frequency as a function of relative FB for 18 HP− runs and 9 HP+ runs from 20 units. Each circle represents BFFB for one HP-rate-ITD curve at one boundary frequency for one unit. FB and BFFB are expressed relative to each unit’s BFITD measured with broadband noise. Blue filled circles connected by solid lines show running medians computed with bin widths of 0.125. E: analysis similar to C, showing neural data and model predictions in response to a binaural edge pitch (BEP)−/+ stimulus for a different example unit.
To quantify the periodicity shifts, the cross-correlation model was refit to the HP-rate-ITD curve for each of the 10 HP stimuli in Fig. 6, A and B, keeping all the model parameters constant except the center frequency of the gammatone filter (which is by definition BFITD when the input is broadband noise). The best-fitting center frequency, BFFB, is called the “dominant frequency” of the corresponding HP-rate-ITD curve. Figure 6C shows the dominant frequency as a function of FB for both the neural data of Fig. 6A and the model data of Fig. 6B. For both data sets, the dominant frequency shifts up (meaning shorter periodicity) when FB is below the BFITD for broadband noise and shifts down when FB is above BFITD, and the shift magnitude is larger when FB is closer to BFITD. The dominant frequency shift is always of opposite sign from the difference FB − BFITD, consistent with the idea that decorrelating one narrow frequency band is akin to removing that band from the ear input signals for a binaural neuron sensitive to interaural correlation.
Figure 6D shows that the dominant frequency shifts observed in the neuron of Fig. 6A also hold across the 20 neurons (27 HP runs) in which a dense 2-D grid of FB and ITD was tested with HP stimuli. The normalized dominant frequency shift relative to the BFITD for broadband noise is plotted as a function of normalized FB expressed relative to BFITD. As in the example of Fig. 6C, the shifts are positive when FB < BFITD and negative when FB > BFITD, and the shift magnitudes are greater when FB is closer to BFITD. Figure 6E shows that shifts in dominant frequency can also be observed for a BEP−/+ stimulus, with a dependence on FB similar to that for HP stimuli. The data in Fig. 6E are from a different neuron than that in Fig. 6A. A dense grid of FB and ITD was rarely tested with BEP stimuli, precluding showing population data comparable to Fig. 6D.
The FB-dependent changes in the periodicity of HP-rate-ITD curves may provide additional cues to the dichotic pitch frequency over the rate-place code described above. Such cues cannot be extracted from the response of an individual neuron to an HP or BEP stimulus presented at a particular ITD but could in principle be represented in the pattern of activity across an array of neurons tuned to different ITDs (labeled line code). Thus, the 2-D pattern of firing rates across a population of neurons tuned to both frequency and ITD contains multiple cues to the dichotic pitch frequency. These cues are explored further in Neural Population Model of ITD Processing Predicts Trends in Human Psychophysical Data, with a neural population model of ITD processing in which the behavior of individual model neurons is consistent with responses of IC neurons to HP and BEP stimuli.
Neural Population Model of ITD Processing Predicts Trends in Human Psychophysical Data
We implemented a neural population model of ITD processing to test whether trends in psychophysical performance for the detection of HP stimuli can be predicted from patterns of firing rates across a population of binaural IC neurons differing in both frequency tuning (BF) and ITD tuning (best IPD). The responses of individual model neurons were governed by the cross-correlation model and therefore embedded the key response properties of IC neurons for HP and BEP stimuli described above.
HP detection against broadband noise.
Figure 7 shows population model predictions for the task of detecting an HP stimulus against a broadband noise “carrier,” which has been tested in human psychophysical studies (36–38, 54). In the experiments of Hartmann and Zhang (36), the noise carrier against which HP stimuli had to be detected was diotic (No) for HP− stimuli, but antiphasic (Nπ) for HP+ stimuli, so that in both cases the information for detection was limited to a narrow frequency band around FB. In general, psychophysical performance for HP detection is best in a midfrequency region (300–800 Hz) and degrades on either side of this range.
Figure 7A shows the 2-D maps of firing rates of the model neural population as a function of BF and best IPD for both HP+ (Fig. 7A, left) and HP− (Fig. 7A, right) stimuli and for two different values of FB: 500 Hz (Fig. 7A, top) and 200 Hz (Fig. 7A, bottom). For both FBs, the frequency of the perceived HP+ pitch is marked primarily by a local decrease in firing rate (due to interaural decorrelation) for BFs near FB in the vertical bands of elevated firing rate produced by the Nπ noise carrier at best IPDs of ±0.5 cycle. There is also a local increase in firing rate near FB within the band of low firing rates centered at best IPDs near 0 cycle (white arrows in Fig. 7A, left). For HP− stimuli, the frequency of the perceived pitch is marked primarily by a local decrease in firing rate for BFs near FB in the main vertical band of elevated firing rate centered at 0 best IPD produced by the No noise carrier. There is also a local increase in firing rate near FB in the vertical band of low firing rates at best IPDs near 0.5 cycle (arrows in Fig. 7A, right).
Figure 7B shows the 2-D map of signed neural D′ produced by the model for discriminating an HP stimulus from the broadband noise carrier in a two-interval detection task similar to that of Hartmann and Zhang (36). The D′ patterns are shown for both HP+ and HP− stimuli with FB values of both 500 and 200 Hz. For all four stimulus conditions, the D′ pattern is complex, including both regions of positive D′ (green in Fig. 7B), where the firing rate for the HP stimulus is higher than firing rate for the No or Nπ noise carrier, as well as regions of negative D′, where the firing rate is lower for the HP stimulus than for the noise carrier. Importantly, these regions of positive and negative D′ (which both provide information for detection) are all centered at BFs near FB.
To obtain an overall measure of detection performance for the model, the D′s from all the model neurons were combined, using the optimal combination rule of signal detection theory for statistically independent variables (50) to yield a predicted percent correct detection score (see methods). Figure 7D shows the predicted detection score as a function of FB for both HP+ (blue line) and HP− stimuli (red line). In this version of the model, the neural D′s were combined across the entire range of BF and best IPD in Fig. 7B with no weighting (uniform distributions). The predicted detection performance is consistently high throughout the range of FB, without showing the dependence on FB and HP phase configuration observed in the psychophysical data (Fig. 7H).
The gray shaded curves in the margins of Fig. 7B, top right, show the distributions of BFITD and best IPD from a large sample of cat IC neurons studied in our laboratory. Figure 7C shows the 2-D pattern of neural D′ after application of weights based on these empirical distributions of BF and best IPD. The IPD weighting emphasizes best IPDs between 0 and +0.25 cycle, thereby simplifying the pattern for each stimulus to one region of positive D′ and one region of negative D′ (arrows in Fig. 7C, bottom). The negative D′ zone centered at best IPDs near 0 for the HP− stimulus is particularly prominent compared with both zones for the HP+ stimulus. In addition, the BF weighting emphasizes the mid-BF region, thereby enhancing the HP detection cues for FB = 500 Hz (Fig. 7C, top) relative to the cues for FB = 200 Hz (Fig. 7C, bottom). Thus, application of physiologically realistic distributions of BF and best IPD shapes the information available for detecting HP stimuli in an FB- and phase (HP+ vs. HP−)-dependent manner.
Figure 7, D–G, show predictions of HP detection performance for various versions of the neural population model in which weighting by physiological BF and best IPD distributions was applied first independently and then together. For each model version in Fig. 7, D–G, the sole free parameter of the model (the “detection efficiency” ϵ; see methods) was chosen to anchor the model performance to the mean performance of the human subjects of Hartmann and Zhang (36) for the HP− stimulus with FB = 300 Hz. Once set for a model version, ϵ was held constant for all stimulus conditions.
As reported above, performance of the unweighted model (Fig. 7D) fails to show the dependence on FB and HP phase observed in the psychophysical data (Fig. 7H). Introduction of a realistic distribution of best IPD (Fig. 7E) makes predicted performance better for HP− stimuli than for HP+ stimuli, consistent with psychophysics, but fails to account for the dependence of performance on FB. Conversely, introduction of a realistic neuronal distribution across BF (Fig. 7F) makes predicted performance improve with increasing FB between 100 and 300 Hz, as observed in the psychophysical data, but fails to account for the effect of HP phase configuration. Finally, application of physiologically realistic weights with respect to both BF and best IPD (Fig. 7G) allows the model to predict both the dependence of performance on FB and the higher performance for HP− compared with HP+, although the effect of phase configuration is not as strong as in the Hartmann and Zhang (36) psychophysical data (Fig. 7H). Thus, introduction of physiologically realistic distributions of BF and best IPD in a population model of binaural IC neurons allows the model to predict key trends in human perceptual detection of HP stimuli. We emphasize that the weights applied to the model were entirely based on physiological data and that no attempt was made to fit data to the model.
Constraints on Model Readout Improve Predictions of Psychophysical Performance
Although the neural population model with physiologically realistic BF and best IPD distributions predicts some trends in psychophysical performance, it still has several deficiencies. First, the difference in detection performance between HP− and HP+ observed in the psychophysical data is underestimated by the model. More importantly, the model offers no explicit means of predicting the perceived Huggins pitch frequency. In previous work with a similar binaural model (35), we showed that imposing constraints on how the information available in the 2-D map of model activity as a function of BF and best IPD is read out can improve predictions of psychophysical performance. Specifically, we showed that imposing the constraint of summing the model firing rates over the entire BF axis for each IPD before quantifying discriminability led to a better match between model predictions and human performance with respect to the dependence of ITD discrimination thresholds on the reference ITD for both pure tones and broadband noise. We therefore explored whether application of appropriate constraints on the model readout can improve the predictions for HP detection and also provide a method for estimating the Huggins pitch frequency. The specific constraint used by Hancock and Delgutte (35), summing firing rates over the BF axis, seems ill suited to HP detection, since the information for this task is limited to a narrow BF band near FB, and indeed was found to be ineffective when tested (not shown). Instead, we implemented the orthogonal constraint whereby model firing rates were summed over the best IPD axis for each BF before quantification of discriminability.
Figure 8A shows heat maps of the difference in firing rates between model responses to HP stimuli (left HP+, right HP−) and the corresponding broadband noise carriers for HP boundary frequency FB = 300 Hz. The rate difference patterns were weighted by the best IPD distribution. From these 2-D maps, a “marginal” D′ curve (blue and red curves on the right margin of each panel in Fig. 8A) was obtained by integrating the rate difference along the best IPD axis for each BF and then dividing by the standard deviation of the marginal rate (see methods). The marginal D′ values were weighted by the BF distribution and then optimally combined along the BF axis to obtain an overall D′ score (shown as numbers at top right of each panel in Fig. 8A), which was transformed into percent correct detection.
Figure 8.
Neural population model of Huggins pitch (HP) detection with constraints on readout. A: heat maps of firing rate difference between the response to a HP stimulus and the response to the broadband noise carrier, weighted by the best interaural phase difference (IPD) distribution. Rate difference (ΔR) maps are shown for HP+ (left) and HP− (right) stimuli with a boundary frequency (FB) of 300 Hz. First, the weighted rate differences were summed across all best IPDs to obtain the marginal firing rates as a function of best frequency (BF). Marginal discriminability (D′) values (shown as blue and red curves on right margin of each panel) were then computed by dividing by the standard deviations of the marginal firing rates and weighting by the BF distribution. Finally, overall D′ values (shown at top right of each panel) were computed by optimally combining the marginal D′ over all BF. B: psychometric data showing percent correct discrimination of HP− and HP+ from their respective noise carriers as a function of FB for the model illustrated in A. C: pitch estimate (relative to FB) as a function of FB for HP+ and HP− stimuli. The estimated pitch is taken as the neural BF corresponding to the maximum value of the marginal D′ function (horizontal line in A).
Figure 8B shows the model percent correct detection score as a function of FB for HP+ and HP− stimuli, which can be compared to the Hartmann and Zhang (36) psychophysical data in Fig. 7H. This constrained, suboptimal model better predicts the trends in the psychophysical data than the optimal model of Fig. 7G with respect to the difference in detection performance between HP− and HP+. This result comes about because the zones of positive (green) and negative (magenta) rate differences in Fig. 8A are roughly balanced for HP+ stimuli, so that they tend to cancel out when the rate difference is integrated along the best IPD axis, whereas the negative zone clearly dominates over the positive zone for HP−, resulting in a rate difference of large magnitude after integration across best IPDs. This difference is in turn reflected in the larger magnitude of the marginal D′ for HP− (red curve on the right margin of Fig. 8A) compared with HP+ (blue curve). This cancellation of positive and negative rate differences did not occur for the model of Fig. 7 because the optimal combination rule sums the squared D′ so that rate differences of either sign always contribute positively to overall performance.
Importantly, the constrained model of Fig. 8 also provides a method for determining the frequency of the perceived Huggins pitch from the 2-D pattern of firing rates. The map of rate differences for HP− in Fig. 8A shows a dominant zone of negative D′ with a centroid slightly above the 300 Hz FB. As a result, the marginal D′ curve for HP− shows a single mode at a BF of 315 Hz, which is taken as the estimated pitch frequency for the HP− stimulus. In contrast, the map of rate differences for HP+ in Fig. 8A shows roughly balanced zones of positive and negative values that are offset along the BF axis. As a result, the marginal D′ curve for HP+ shows two modes of unequal sizes corresponding to the zones of positive and negative rate differences. The largest mode centered at 260 Hz is the estimated pitch frequency for HP+.
Figure 8C shows the model’s estimated pitch frequencies for HP+ and HP− stimuli as a function of FB. The pitch estimate for HP+ is consistently 10–15% below FB over the entire frequency range. In contrast, the estimated pitch for HP− shows a decaying trend from ∼10% above FB at low frequencies to slightly below FB above 500 Hz. This decay in the HP− pitch estimate is due to weighting by the BF distribution that was imposed on the marginal D′. If the weighting is omitted, the estimated HP− pitch stays at 2–3% above FB over the entire frequency range (not shown). However, there is no theoretical justification for omitting the BF weighting since the BF distribution of ITD-sensitive neurons in IC is an important physiological observation that plays a critical role in predicting the frequency dependence of HP detection in the model.
The deviations of the model’s pitch estimates from FB (up to 10–15%), and even more so the difference in estimated pitches between HP− and HP+ (>20% at low frequencies), are well above a semitone (6%) and therefore should be easily detectable by human listeners. To our knowledge, such deviations have not been reported in the psychophysical literature, a point to which we return in discussion.
DISCUSSION
We measured responses of IC neurons in anesthetized cats to noise stimuli that evoke a strong dichotic pitch (HP or BEP) in human listeners. The boundary frequency (which approximately corresponds to the frequency of the perceived pitch) was systematically varied around each neuron’s best frequency to identify response features associated with pitch. In response to BEP stimuli, neurons showed an edge in firing rate when the boundary frequency was near the BF, and the edge direction (rising or falling) depended on both the interaural phase configuration (BEP+/− vs. BEP−/+) and the imposed ITD (best vs. worst). Similarly, responses to HP stimuli showed an extremum in firing rate for boundary frequencies near the BF, with the type of extremum (peak or trough) depending on the phase configuration (HP+ vs. HP−) and imposed ITD. For the most part, the directions of the edges and extrema in firing rate were consistent with predictions of a model of ITD processing incorporating both frequency tuning and interaural cross-correlation, although the agreement was tighter for BEP than for HP. A neural population model of ITD processing in which the behavior of individual neural elements was governed by the cross-correlation model predicted key trends in HP detection by human listeners but only when the model incorporated physiologically realistic distributions of frequency and ITD tuning. Imposing additional constraints on how the map of model activity is read out further improved agreement with psychophysical data for HP detection and led to the novel prediction that the perceived pitch of HP stimuli may differ between the two phase configurations (HP+ and HP−) and markedly deviate from the boundary frequency. To our knowledge, this is the first combined single-unit and modeling study of responses of auditory neurons to stimuli evoking a dichotic pitch.
Agreement of Neural Data with Cross-Correlation Model
We identified a variety of cues to the dichotic pitch frequency in the firing rate responses of IC neurons to HP and BEP stimuli. Responses to HP stimuli showed either peaks or troughs in firing rate when the boundary frequency FB crossed the neuron’s BFITD determined from rate-ITD curves for broadband noise. Responses to BEP stimuli showed either rising or falling edges in firing rate when FB crossed BFITD. In general, the directions of the extrema (for HP) and edges (for BEP) were consistent with predictions of a generic cross-correlation model of ITD processing incorporating frequency tuning, an internal delay, and a monotonically increasing relationship between firing rate and effective interaural correlation. The cross-correlation model also qualitatively predicted the apparent changes in frequency tuning observed in rate-ITD curves for HP and BEP stimuli with different FB (Fig. 6). These changes in the dominant frequency of rate-ITD curves with FB are important because they result in a tilt in the maps of rate differences between model responses to HP stimuli and the corresponding noise carrier (Fig. 8A) that shape the estimated pitch frequency in the model.
A cross-correlation model for responses of ITD-sensitive IC neurons was first proposed and tested using noise stimuli by Tom Yin and his colleagues (31). Later studies further supported this model (24, 35, 55), although it needs to be extended to account for ITD sensitivity to cochlea-generated envelopes (46, 56, 57) and adaptation to binaural statistics of the stimulus set (58) The cross-correlation model was also shown to predict correlates of binaural masking level differences (BMLDs) in the detection of tones in noise (33, 34, 59, 60). Here we show that this model also predicts key features of responses to noise stimuli that evoke a dichotic pitch.
Mc Laughlin et al. (24) also used BEP stimuli (which they called flip noise) in single-unit experiments in anesthetized cats aimed at comparing binaural bandwidths of IC neurons with monaural bandwidths of auditory nerve fibers. As in the present experiments, they varied the BEP boundary frequency FB (which they called “flip frequency”) around each neuron’s BF and imposed an external ITD equal to the neuron’s best ITD to maximize the effect of changing FB on the firing rates. They observed edges in firing rate when FB crossed the BF that were well fit by a cross-correlation model similar to ours. In particular, the frequency of the edge in firing rate for flip noise was tightly correlated with the characteristic frequency (CF) derived from pure-tone threshold curves (compare with our Fig. 3B), and the bandwidth derived from responses to flip noise increased with increasing CF, consistent with the decrease in edge slope magnitude with increasing BFITD in Fig. 3D. Thus, our results are in excellent agreement with those of Mc Laughlin et al. (24).
Although a vast majority of our cat IC neurons [as well as the neurons of McLaughlin (24) for BEP] encoded the frequency of the dichotic pitch in their firing rates, only a small fraction of the guinea pig MSO neurons studied by Alsindi et al. (25) showed correlates of dichotic repetition pitch (DRP) in their interspike interval distributions, although their neurons consistently coded the frequency of monaural repetition pitch. As pointed out by Alsindi et al. (25), DRP is perceptually less salient than HP and BEP for human listeners, suggesting that neural correlates may be less prevalent or less prominent. Whereas we systematically varied the expected HP or BEP pitch frequency around each neuron’s BF to search for rate-place cues to pitch, Alsindi et al. (25) looked for temporal correlates of pitch using a fixed set of pitch frequencies for all neurons. It is also possible that weak cues to DRP available in MSO would be enhanced in the IC via either convergence of similarly tuned excitatory inputs or lateral inhibition from the dorsal nucleus of the lateral lemniscus.
Although there was good agreement between our neural data and predictions of the cross-correlation model for BEP stimuli, the agreement was not as strong for HP stimuli. Specifically, there was considerable scatter in the relationship between the extremum frequency and BFITD for HP stimuli (Fig. 5B). Moreover, the type of extremum (peak vs. trough) observed in response to HP stimuli was not always as predicted by the model, although overall the fraction of cases in which the types of extrema were as predicted was significantly above chance. There are several possible reasons for this poorer fit. First, we chose the transition bandwidth for HP stimuli to be 8% of FB because this value gives rise to a strong Huggins pitch in human listeners (e.g., Refs. 1, 8, 36, 54). However, cochlear filter bandwidths inferred from the phase of stimulus-frequency otoacoustic emissions (SFOAEs) are broader in cats and other small experimental animals than in humans (61) and wider than 8% over the BF range (<2 kHz) of our neural data. A transition bandwidth narrower than the bandwidth of the cochlear filter means that only a fraction of the energy within the cochlear filter centered at FB is available to produce the changes in interaural correlation underlying extrema in firing rate. Thus, our choice of a narrow transition bandwidth was likely suboptimal for eliciting large changes in firing rate with FB in cat binaural neurons.
A second reason why the cross-correlation model did not fit the data as well for HP compared with BEP is that the continuous presentation of our stimuli over several minutes likely resulted in long-term adaptation, and therefore low firing rates. This adaptation was likely more pronounced for HP stimuli than for BEP because the IACC alternates between high and low values for BEP stimuli, whereas the effective IACC only deviates from its baseline value for a narrow range of FB near the BF for HP stimuli. Indeed, both median firing rates and the range of firing rates across FB were smaller for HP stimuli than for BEP (median firing rates: 4.7 spikes/s for HP vs. 7.6 spikes/s for BEP; median range: 6.7 spikes/s for HP vs. 24.6 spikes/s for BEP; all differences P < 0.001 by Kruskal–Wallis tests). Together, the choice of a narrow transition width and the lower firing rates for HP stimuli due to long-term adaptation may have resulted in more noisy data for HP stimuli compared with BEP, a reduced ability to reliably detect extrema in firing rate, and a less clear-cut test of model predictions.
Limitations of the Neurophysiological Study
The stimulus levels of our dichotic pitch stimuli (40–70 dB SPL range, with a mode at 50 dB SPL) are lower than those typically used in human studies of dichotic pitch perception, which tested levels up to at least 80 dB SPL, with no obvious dependence of pitch strength on stimulus level (1, 3, 36, 54). We expect the rate-place code for boundary frequency we documented to still be available at the levels used in psychophysical studies because the sensitivity of IC neurons to ITDs in the temporal fine structure of noise stimuli is known to be robust over a wide range of stimulus levels (45). This robustness is consistent with the fact that ITD sensitivity is dependent upon phase locking in the auditory nerve, which is also robust to variations in stimulus level (62).
Because our neurons’ rate-ITD curves had to show oscillations characteristic of ITD sensitivity to the temporal fine structure (46) in order to measure a BFITD, the highest BF in our sample of neurons (and therefore the highest FB for which we could demonstrate a neural correlate) was 1,760 Hz. Yet, dichotic pitch percepts have been reported at higher frequencies, as high as 2,400 Hz in some subjects for BEP (2) and as high as 3,200 Hz in one subject for HP (54). Although IC neurons sensitive to ITDs in the temporal fine structure become increasingly rare for BFs above 1,800 Hz, some neurons with BFs up to at least 10 kHz are sensitive to ITDs in the cochlea-induced envelope of noise stimuli (46, 57). These neurons sensitive to envelope ITDs might provide cues to dichotic pitches above 2 kHz because 1) they are frequency selective and 2) they are expected to be sensitive to interaural decorrelation in the envelope near FB, although these cues may not be as salient as the pitch cues available at lower frequencies where neurons are sensitive to ITD in the temporal fine structure.
Our experiments were performed in an anesthetized preparation. Although injection of anesthetics in rabbits has been shown to alter ITD tuning (63) and temporal properties (64) of individual IC neurons, across the population of IC neurons the characteristics of ITD tuning are broadly similar in awake and anesthetized preparations (65). The most obvious differences between the two types of preparation are the higher spontaneous and driven firing rates of IC neurons in the awake preparation (63–65). Overall, studies in awake preparations have not led to a fundamental reexamination of the cross-correlation model for sensitivity to ITD. We therefore expect our general pattern of results to remain valid if the experiments were repeated in an unanesthetized preparation. In addition to anesthesia, task engagement can also alter the responses of IC neurons (66–68), although the degree to which binaural properties may be affected is unknown.
We focused on cues to dichotic pitch available in the firing rates of IC neurons. Modeling studies (69, 70) suggest that cues to the pitch of BEP stimuli may also be found in the temporal firing patterns at the primary stage of binaural interactions in the MSO. Without discounting this possibility, the availability of temporal cues is much less likely in the IC, which is the site of a major degradation in phase locking to the temporal fine structure (71, 72). Testing for the availability of temporal cues to HP and BEP in our data would be difficult because of the low firing rates and modest number of stimulus repetitions. Temporal cues present an area of exploration for future studies of dichotic pitch.
The present study was conducted in cats, a species that is not known to exhibit dichotic pitch percepts. In fact, we are not aware of any behavioral study of dichotic pitch in nonhuman vertebrates. Nevertheless, the similarities across mammalian species in both brain stem binaural mechanisms (73–76) and behavior related to binaural detection (77–79) and ITD sensitivity (80–84) make it likely that cats and other mammals would demonstrate tonelike percepts in response to HP and BEP stimuli. The question might be addressed with stimulus generalization techniques like those used in studies of complex pitch perception (85–87).
Relation of Neural Discriminability with Psychophysical Detection and Discrimination
We used signal detection theory to compute neural JNDs for the boundary frequency FB of HP and BEP stimuli from rate responses of single IC neurons to facilitate comparison with psychophysical data on FB discrimination. Some trends in the neural JNDs paralleled trends in the psychophysical data. Psychophysical data on HP frequency discrimination (1, 36, 38, 88) show best performance between ∼200 Hz and 800 Hz, with a degradation on both sides of this range. Our neural JNDs for HP boundary frequency discrimination (Fig. 5F) show an increasing trend with BFITD, consistent with the degradation in performance at higher frequencies observed in the psychophysical data. The paucity of neurons with BFITD below 250 Hz in our sample makes it difficult to test whether neural JNDs would also increase at frequencies below 300 Hz.
Our data also show an increasing trend in neural JNDs for BEP boundary frequency (Fig. 3F). This trend reflects the shallower slopes of the edge in firing rates at higher BFITD (Fig. 3D), which in turn appears to result from the increase in cochlear bandwidths (24). Psychophysical data on BEP frequency discrimination are sparse. Klein and Hartmann (2) measured performance in the related task of pitch matching between a pure tone and a BEP stimulus. The pitch matches were most precise for boundary frequencies between 300 and 850 Hz and degraded on both sides of this range. This trend is again consistent with our neural JNDs given the limitation that neural data from low BF are lacking.
Despite a broad agreement between neural and psychophysical data with respect to the frequency dependence of discriminability, the two sets of data differ with respect to the effect of interaural phase configuration. Our neural JNDs for boundary frequency were much larger for HP stimuli than for BEP stimuli. This was true for both the median JNDs (9.4% for HP vs. 3.5% for BEP) and the “best” (10th percentile) JNDs (3.4% for HP vs. 0.71% for BEP), which may be more relevant to behavior according to the “lower envelope principle” (89). Although we are not aware of any direct comparison of perceptual JNDs between HP and BEP, data from related tasks such as pitch matching and melody recognition (2, 8, 38, 90) suggest that HP is somewhat easier to detect and discriminate than BEP, and therefore perceptual JNDs should be smaller for HP, which is the opposite of the trend observed in the neural data. There is a further difference between the neural and perceptual data in the effect of interaural phase configuration (HP+ vs. HP−). Hartmann and Zhang (36) found that both detection and frequency discrimination of HP stimuli at low frequencies was better for the HP− configuration than for the HP+ configuration. In contrast, we found no significant difference in neural JNDs between the two configurations (Fig. 5E). The different dependence of neural and perceptual JNDs on stimulus variables suggests that perceptual discrimination is not solely limited by the response properties of individual IC neurons but also by how the information available in the activity of the neural population is extracted and processed at later stages, an issue we addressed with the neural population model.
It is also of interest to compare the values of neural and perceptual JNDs, although such a comparison is fraught with difficulties. The neural JNDs for our HP stimuli are higher than the perceptual JNDs reported in the literature: The mean JND among 10 normal-hearing subjects tested by Santurette and Dau (38) was 2.3% for HP− stimuli at 500 Hz. The mean standard deviation of pitch matches between HP stimuli and a pure tone (which can be interpreted as a form of JND) was 1.2% for HP stimuli at 600 Hz in the study of Culling, Summerfield, and Marshall (8). These values are smaller than our median neural JNDs (9.4%) but not much lower than our 10th percentile JNDs (3.4%), which may be more behaviorally meaningful. These perceptual studies used longer-duration stimuli than the 200-ms duration of each “note” in our continuous HP stimuli. The discrepancy between neural and perceptual JNDs is reduced when these differences in duration are taken into account: Plack, Turgeon, Lancaster, Carlyon, and Gockel (88) report mean JNDs of ∼6% at 600 Hz for 200-ms HP− stimuli, which is close to our median neural JNDs.
Comparing our neural JNDs with those for other binaural tasks, HP frequency discrimination appears to be intermediate between ITD discrimination, where the best (lowest) neural JNDs in the IC of anesthetized guinea pigs are comparable to human perceptual JNDs, consistent with the lower envelope principle (91, 92), and discrimination of interaural correlation, where neural JNDs are substantially larger than perceptual JNDs (30, 93), so that considerable pooling of information across neurons is necessary to account for behavioral performance in this task.
In summary, although the increasing trend in neural JNDs for HP frequency with increasing FB is consistent with the degradation in perceptual discrimination performance at higher frequencies, our results contrast with psychophysical data in that neural JNDs were lower for BEP frequency discrimination than for HP frequency discrimination and were similar for HP− and HP+ stimuli. Comparisons of JND values between neural and perceptual data suggest that a modest pooling of information across neurons is necessary to account for perceptual performance. The neural population model discussed next addresses what form of processing might implement this pooling and how the different neural and perceptual trends with respect to the effect of the interaural phase can be reconciled.
Neural Population Model for Huggins Pitch Detection and Pitch Frequency Estimation
We implemented a neural population model of ITD processing (35) in which the behavior of individual neural elements was governed by a cross-correlation model consistent with responses of IC neurons, and the model neurons were arranged independently by BF and best IPD with physiologically realistic distributions. The model predicted trends in human psychophysical performance for HP detection as a function of boundary frequency and interaural phase configuration (HP+ vs. HP−).
The neural population model shows both similarities with and differences from other models for dichotic pitch phenomena. These models include the central auditory pattern (CAP) model (3, 42, 51) and the equalization-cancellation (EC) model, either in its original, single-channel (or broadband) form (2, 94) or in its multichannel form (8, 44, 95). The CAP model can be considered as an instance of the classic Jeffress (96) binaural cross-correlation model in the limit of very fine frequency resolution (8). The CAP model, the multichannel EC (mEC) model, and our neural population model all include a 2-D map of activity organized along axes of best frequency and internal delay, although our population model uses interaural phase (the product of BF and internal delay) rather than a frequency-independent delay. This choice was motivated by the observation that the best IPDs of IC neurons are more similarly distributed across frequency than the best ITDs, which are restricted to an increasingly narrow range at high frequencies as a result of the π limit (35, 48).
The physiologically realistic distributions of BF and best IPD included in our neural population model were essential for predicting the frequency dependence of HP detection performance and the better performance with HP− compared to HP+, respectively (Fig. 7). These distributions have parallels in other models. The BF distribution peaks for frequencies near 400 Hz, not unlike the dominance region for pitch in the CAP model (42), although the latter peaks near 500–600 Hz. The best IPD distribution for one IC peaks for IPDs between −0.1 and +0.3 cycle, but when both ICs are implemented the effect is to emphasize a broad central region between ±0.25 cycle. This is conceptually similar to the p(τ) function used to implement the centrality principle in Jeffress-type models of binaural processing (6, 97, 98), although again our model differs in its use of IPD instead of ITD as the coordinate of the centrality function. Importantly, whereas the p(τ) and frequency dominance functions used in other models were originally fit to psychophysical data, our best IPD and BF distributions were independently derived from physiological data from a large sample of IC neurons in cats.
We used the optimal combination rule of signal detection theory (50) to derive the overall performance of our neural population model from the performances (D′) of individual neurons (Eq. 7). This formula assumes that the neuronal firing rates are conditionally statistically independent. This assumption is supported by the observation that cross-correlation coefficients between the firing rates of simultaneously recorded IC neurons are generally small (99), although information on this issue is still very limited. The existence of noise correlations in the neuronal firing rates could affect model performance in complex ways (100, 101). The optimal rule for combining D′ across model neurons (Eq. 7) is an efficient way of computing overall model performance for comparison with psychophysical performance. The actual underlying computation is a weighted sum of firing rates followed by a threshold device (50, 102), which leads to Eq. 7 when the weights are chosen optimally. This computational structure is biologically plausible because it could in principle be implemented by convergence of model neurons with different synaptic strengths onto a central neuron.
The implementation of HP detection in the neural population model required subtraction of the firing rates of individual model neurons for an HP stimulus from the firing rates for a broadband noise carrier (No for HP−, Nπ for HP+). This approach directly parallels psychophysical experiments in which subjects are presented with two stimulus intervals on each trial, one interval containing a HP stimulus and the other one containing the noise carrier, and are asked to choose in which interval they hear a pitch (36, 38). However, HP can also be detected and perceived as a musical pitch on its own, in the absence of a contrasting noise carrier (1, 38, 90). Since information for HP detection is limited to a narrow frequency band near FB, it may be possible to approximate the pattern of activity produced by the noise carrier alone by interpolating the vertical ridges of activity in the BF-best IPD map for HP stimuli so as to fill in frequencies near FB. This interpolated carrier pattern could then be subtracted from the pattern for the HP stimulus as in the present implementation of the population model. This approach is conceptually similar to the “reconstruction-comparison” model of Akeroyd and Summerfield (44).
Modeling detection of BEP stimuli would require a different approach than subtracting the pattern of response to BEP from the pattern for a noise carrier because there is no appropriate carrier. For example, for a BEP+/− stimulus subtracting the response to No noise would leave a large residual at frequencies above FB, whereas subtracting the response to Nπ noise would leave a residual at low frequencies. In either case, information for detection would be available over a wide region of the BF-best IPD map, so that performance should be better than for HP detection, where information is confined to a narrow frequency band. This is contrary to psychophysical observations, which show that BEP is somewhat harder to detect than HP (2, 8, 38). A salient feature of the model response maps for BEP stimuli is the presence of a sharp discontinuity near FB along the BF axis for each best IPD. These discontinuities could in principle be detected by a gradient operation, which could be implemented neurally by a lateral inhibition mechanism as proposed by Klein and Hartmann (2).
Constraints on Processing the Activity Pattern of the Neural Population Model
The version of the neural population model in which information from all the neurons in the 2-D array organized along BF and best IPD was optimally combined (Fig. 7) predicted the frequency dependence of psychophysical detection for HP but underestimated the difference in performance between HP− and HP+ and lacked an explicit method for estimating the perceived frequency of HP. Our earlier work (35) showed that imposing constraints on how information available in the 2-D map of model neurons is read out can improve predictions of psychophysical performance in ITD discrimination. The existence of constraints on readout is also supported by the observation that listeners are unable to perceptually group sound components across frequency based on a common ITD (103). We thus tested a suboptimal model in which firing rates were summed along the best IPD axis for each BF, yielding a one-dimensional array of marginal firing rates against BF or “central spectrum” on which HP detection was based. This suboptimal model predicted a difference in detection performance between HP− and HP+ more in line with the Hartmann and Zhang (36) psychophysical data than the optimal model (Fig. 8B) and also provided an estimate of the perceived HP frequency (Fig. 8C).
The particular constraint imposed on the model readout, summing firing rates along the best IPD axis, is orthogonal to the constraint used by Hancock and Delgutte (35), where firing rates were summed along the BF axis for each best IPD. The use of different constraints reflects the different natures of the two tasks, in that perception of a tonelike dichotic pitch requires identifying a feature along the frequency axis whereas the ITD discrimination task modeled by Hancock and Delgutte (35) requires detecting features along the internal delay axis. We thus suggest that, although binaural processing is initially based on a 2-D map organized by BF and best IPD that is computed in the auditory brain stem, the central processor only has access to part of the information contained in this map, and the constraints on processing this information depend on the behavioral task.
The suboptimal version of the neural population model provided an estimate of the perceived pitch frequency of HP stimuli from the mode of the pattern of marginal D′ between responses to HP and responses to the noise carrier along the BF axis. This operation can be interpreted as an instance of the “winner take all” principle widely used in artificial neural networks (104, 105). The pitch frequencies estimated in this way differed for HP+ and HP− and deviated systematically from the boundary frequency FB. Although systematic deviations of the perceived pitch frequency from FB have been reported for BEP and its monaural counterpart (2), we are not aware of any report of such deviations for HP. Summarizing the results of unpublished experiments by Guttman (106) in which the pitch of HP stimuli with a boundary frequency near 500 Hz was matched to that of a pure tone, Bilsen (3) states that the pitch always corresponded to that of a 500-Hz pure tone, except when the relative transition bandwidth exceeded 25%. The deviations from FB predicted by the suboptimal model are greater than one semitone (the smallest interval used in traditional Western music) and greater than the JND for HP frequency, which is <3% for FB near 500 Hz (8, 38). We therefore expect that such large deviations would have been reported if they actually occurred. The underlying reason for the large deviations from FB in the model is that the pattern of marginal D′ against BF often contains two modes located on opposite sides of FB (blue curve on the right of Fig. 8A). These modes correspond to regions of the 2-D map in which the firing rates for HP stimuli are larger (resp. smaller) than the rates for the noise carrier (Fig. 8A). The winner-take-all strategy implemented in the model selects the largest of these modes, resulting in a large deviation from FB. Alternatively, the pitch frequency could be estimated from a weighted sum of the two modes, which would yield a value closer to FB, but it would be hard to choose the weights in a principled way that would work in a wide range of cases.
Conclusions
Dichotic pitches offer insights into the neural mechanisms for both binaural hearing and pitch perception. In this first neurophysiological study of Huggins pitch and binaural edge pitch at the single-unit level, we found that responses of IC neurons to these stimuli can be qualitatively predicted from a well-established cross-correlation model of binaural processing incorporating frequency tuning, internal delay, and a monotonic relationship between firing rate and interaural cross-correlation. Single-neuron JNDs for HP frequency discrimination computed from signal detection theory paralleled some trends in the psychophysical data but also differed with respect to the dependence on interaural phase, showing that additional processing beyond individual IC neurons is necessary. A Jeffress-type neural population model of ITD processing (35) predicted the frequency dependence of HP detection when it incorporated physiologically realistic distributions of BF and best IPD. Imposing an additional constraint on the model readout improved predictions with respect to the dependence on interaural phase pattern and also provided a method for estimating the perceived pitch frequency. Possible directions for future studies include behavioral studies of dichotic pitch in animal models, single-unit studies in unanesthetized preparations, exploration of temporal codes for dichotic pitches, and more comprehensive neural population models.
DATA AVAILABILITY
Data will be made available upon reasonable request.
GRANTS
This work was supported by National Institute on Deafness and Other Communication Disorders Grants P01 DC000119, R01 DC002258, and F32 DC005295.
DISCLOSURES
No conflicts of interest, financial or otherwise, are declared by the authors.
AUTHOR CONTRIBUTIONS
K.E.H. and B.D. conceived and designed research; K.E.H. performed experiments; K.E.H. analyzed data; K.E.H. and B.D. interpreted results of experiments; K.E.H. prepared figures; K.E.H. and B.D. drafted manuscript; K.E.H. and B.D. edited and revised manuscript; K.E.H. and B.D. approved final version of manuscript.
ACKNOWLEDGMENTS
We thank Leslie Liberman and Connie Miller for expert surgical support.
REFERENCES
- 1. Cramer EM, Huggins WH. Creation of pitch through binaural interaction. J Acoust Soc Am 30: 858–866, 1958. [Google Scholar]
- 2. Klein MA, Hartmann WM. Binaural edge pitch. J Acoust Soc Am 70: 51–61, 1981. doi: 10.1121/1.386581. [DOI] [PubMed] [Google Scholar]
- 3. Bilsen FA. Pitch of noise signals: evidence for a “central spectrum”. J Acoust Soc Am 61: 150–161, 1977. doi: 10.1121/1.381276. [DOI] [PubMed] [Google Scholar]
- 4. Hirsh IJ. The influence of interaural phase on interaural summation and inhibition. J Acoust Soc Am 20: 536–544, 1948. doi: 10.1121/1.1906407. [DOI] [Google Scholar]
- 5. Licklider JCR. The influence of interaural phase relations upon the masking of speech by white noise. J Acoust Soc Am 20: 150–159, 1948. doi: 10.1121/1.1906358. [DOI] [Google Scholar]
- 6. Colburn HS. Theory of binaural interaction based on auditory-nerve data. I. General strategy and preliminary results on interaural discrimination. J Acoust Soc Am 54: 1458–1470, 1973. doi: 10.1121/1.1914445. [DOI] [PubMed] [Google Scholar]
- 7. Colburn HS. Theory of binaural interaction based on auditory-nerve data. II. Detection of tones in noise. J Acoust Soc Am 61: 525–533, 1977. doi: 10.1121/1.381294. [DOI] [PubMed] [Google Scholar]
- 8. Culling JF, Summerfield AQ, Marshall DH. Dichotic pitches as illusions of binaural unmasking. I. Huggins' pitch and the “binaural edge pitch”. J Acoust Soc Am 103: 3509–3526, 1998. doi: 10.1121/1.423059. [DOI] [PubMed] [Google Scholar]
- 9. Dougherty RF, Cynader MS, Bjornson BH, Edgell D, Giaschi DE. Dichotic pitch: a new stimulus distinguishes normal and dyslexic auditory function. Neuroreport 9: 3001–3005, 1998. doi: 10.1097/00001756-199809140-00015. [DOI] [PubMed] [Google Scholar]
- 10. Edwards VT, Giaschi DE, Dougherty RF, Edgell D, Bjornson BH, Lyons C, Douglas RM. Psychophysical indexes of temporal processing abnormalities in children with developmental dyslexia. Dev Neuropsychol 25: 321–354, 2004. doi: 10.1207/s15326942dn2503_5. [DOI] [PubMed] [Google Scholar]
- 11. Partanen M, Fitzpatrick K, Mädler B, Edgell D, Bjornson B, Giaschi DE. Cortical basis for dichotic pitch perception in developmental dyslexia. Brain Lang 123: 104–112, 2012. doi: 10.1016/j.bandl.2012.09.002. [DOI] [PubMed] [Google Scholar]
- 12. Chait M, Eden G, Poeppel D, Simon JZ, Hill DF, Flowers DL. Delayed detection of tonal targets in background noise in dyslexia. Brain Lang 102: 80–90, 2007. doi: 10.1016/j.bandl.2006.07.001. [DOI] [PubMed] [Google Scholar]
- 13. Santurette S, Poelmans H, Luts H, Ghesquiére P, Wouters J, Dau T. Detection and identification of monaural and binaural pitch contours in dyslexic listeners. J Assoc Res Otolaryngol 11: 515–524, 2010. doi: 10.1007/s10162-010-0216-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Lodhia V, Brock J, Johnson BW, Hautus MJ. Reduced object related negativity response indicates impaired auditory scene analysis in adults with autistic spectrum disorder. PeerJ 2: e261, 2014. doi: 10.7717/peerj.261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Lodhia V, Hautus MJ, Johnson BW, Brock J. Atypical brain responses to auditory spatial cues in adults with autism spectrum disorder. Eur J Neurosci 47: 682–689, 2018. doi: 10.1111/ejn.13694. [DOI] [PubMed] [Google Scholar]
- 16. Bilsen FA. Pronounced binaural pitch phenomenon. J Acoust Soc Am 59: 467–468, 1976. doi: 10.1121/1.380892. [DOI] [PubMed] [Google Scholar]
- 17. Gockel HE, Carlyon RP, Plack CJ. Combination of spectral and binaurally created harmonics in a common central pitch processor. J Assoc Res Otolaryngol 12: 253–260, 2011. doi: 10.1007/s10162-010-0250-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Plack CJ, Oxenham AJ, Kreft HA, Carlyon RP. Central auditory masking by an illusory tone. PLoS One 8: e75822, 2013. doi: 10.1371/journal.pone.0075822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Johnson BW, Hautus M, Clapp WC. Neural activity associated with binaural processes for the perceptual segregation of pitch. Clin Neurophysiol 114: 2245–2250, 2003. doi: 10.1016/s1388-2457(03)00247-5. [DOI] [PubMed] [Google Scholar]
- 20. Johnson BW, Muthukumaraswamy SD, Hautus MJ, Gaetz WC, Cheyne DO. Neuromagnetic responses associated with perceptual segregation of pitch. Neurol Clin Neurophysiol 2004: 33, 2004. [PubMed] [Google Scholar]
- 21. Hertrich I, Mathiak K, Menning H, Lutzenberger W, Ackermann H. MEG responses to rippled noise and Huggins pitch reveal similar cortical representations. Neuroreport 16: 193–196, 2005. doi: 10.1097/00001756-200502080-00026. [DOI] [PubMed] [Google Scholar]
- 22. Chait M, Poeppel D, Simon JZ. Neural response correlates of detection of monaurally and binaurally created pitches in humans. Cereb Cortex 16: 835–848, 2006. doi: 10.1093/cercor/bhj027. [DOI] [PubMed] [Google Scholar]
- 23. Puschmann S, Uppenkamp S, Kollmeier B, Thiel CM. Dichotic pitch activates pitch processing centre in Heschl’s gyrus. Neuroimage 49: 1641–1649, 2010. doi: 10.1016/j.neuroimage.2009.09.045. [DOI] [PubMed] [Google Scholar]
- 24. Mc Laughlin M, Van de Sande B, van der Heijden M, Joris PX. Comparison of bandwidths in the inferior colliculus and the auditory nerve. I. Measurement using a spectrally manipulated stimulus. J Neurophysiol 98: 2566–2579, 2007. doi: 10.1152/jn.00595.2007. [DOI] [PubMed] [Google Scholar]
- 25. Alsindi S, Patterson RD, Sayles M, Winter IM. The responses of single units to simple and complex sounds in the superior olivary complex of guinea pigs. Acta Acust United Acust 104: 856–859, 2018. doi: 10.3813/AAA.919240. [DOI] [Google Scholar]
- 26. Aitkin LM, Webster WR, Veale JL, Crosby DC. Inferior colliculus. I. Comparison of response properties of neurons in central, pericentral, and external nuclei of adult cat. J Neurophysiol 38: 1196–1207, 1975. doi: 10.1152/jn.1975.38.5.1196. [DOI] [PubMed] [Google Scholar]
- 27. Ramachandran R, Davis KA, May BJ. Single-unit responses in the inferior colliculus of decerebrate cats. I. Classification based on frequency response maps. J Neurophysiol 82: 152–163, 1999. doi: 10.1152/jn.1999.82.1.152. [DOI] [PubMed] [Google Scholar]
- 28. Palmer AR, Shackleton TM, Sumner CJ, Zobay O, Rees A. Classification of frequency response areas in the inferior colliculus reveals continua not discrete classes. J Physiol 591: 4003–4025, 2013. doi: 10.1113/jphysiol.2013.255943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Albeck Y, Konishi M. Responses of neurons in the auditory pathway of the barn owl to partially correlated binaural signals. J Neurophysiol 74: 1689–1700, 1995. doi: 10.1152/jn.1995.74.4.1689. [DOI] [PubMed] [Google Scholar]
- 30. Shackleton TM, Arnott RH, Palmer AR. Sensitivity to interaural correlation of single neurons in the inferior colliculus of guinea pigs. J Assoc Res Otolaryngol 6: 244–259, 2005. doi: 10.1007/s10162-005-0005-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Yin TC, Chan JC, Carney LH. Effects of interaural time delays of noise stimuli on low-frequency cells in the cat's inferior colliculus. III. Evidence for cross-correlation. J Neurophysiol 58: 562–583, 1987. doi: 10.1152/jn.1987.58.3.562. [DOI] [PubMed] [Google Scholar]
- 32. Palmer AR, Jiang D, McAlpine D. Desynchronizing responses to correlated noise: a mechanism for binaural masking level differences at the inferior colliculus. J Neurophysiol 81: 722–734, 1999. doi: 10.1152/jn.1999.81.2.722. [DOI] [PubMed] [Google Scholar]
- 33. Lane CC, Delgutte B. Neural correlates and mechanisms of spatial release from masking: single-unit and population responses in the inferior colliculus. J Neurophysiol 94: 1180–1198, 2005. doi: 10.1152/jn.01112.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Fan L, Henry KS, Carney LH. Responses to dichotic tone-in-noise stimuli in the inferior colliculus. Front Neurosci 16: 997656, 2022. doi: 10.3389/fnins.2022.997656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Hancock KE, Delgutte B. A physiologically based model of interaural time difference discrimination. J Neurosci 24: 7110–7117, 2004. doi: 10.1523/JNEUROSCI.0762-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Hartmann WM, Zhang PX. Binaural models and the strength of dichotic pitches. J Acoust Soc Am 114: 3317–3326, 2003. doi: 10.1121/1.1624072. [DOI] [PubMed] [Google Scholar]
- 37. Yost WA. Thresholds for segregating a narrow-band from a broadband noise based on interaural phase and level differences. J Acoust Soc Am 89: 838–844, 1991. doi: 10.1121/1.1894644. [DOI] [PubMed] [Google Scholar]
- 38. Santurette S, Dau T. Binaural pitch perception in normal-hearing and hearing-impaired listeners. Hear Res 223: 29–47, 2007. doi: 10.1016/j.heares.2006.09.013. [DOI] [PubMed] [Google Scholar]
- 39. Yost WA, Harder P, Dye R. Complex spectral patterns with interaural differences: dichotic pitch and the ‘central spectrum’. In: Auditory Processing of Complex Sounds, edited by Yost WA, Watson CS.. Hillsdale, NJ: Lawrence Erlbaum, 1987, p. 190–201. [Google Scholar]
- 40. Kuwada S, Yin TC, Wickesberg RE. Response of cat inferior colliculus neurons to binaural beat stimuli: possible mechanisms for sound localization. Science 206: 586–588, 1979. doi: 10.1126/science.493964. [DOI] [PubMed] [Google Scholar]
- 41. Cariani PA, Delgutte B. Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. J Neurophysiol 76: 1698–1716, 1996. doi: 10.1152/jn.1996.76.3.1698. [DOI] [PubMed] [Google Scholar]
- 42. Raatgever J, Bilsen FA. A central spectrum theory of binaural processing. Evidence from dichotic pitch. J Acoust Soc Am 80: 429–441, 1986. doi: 10.1121/1.394039. [DOI] [PubMed] [Google Scholar]
- 43. Zhang PX, Hartmann WM. Lateralization of Huggins pitch. J Acoust Soc Am 124: 3873–3887, 2008. doi: 10.1121/1.2977683. [DOI] [PubMed] [Google Scholar]
- 44. Akeroyd MA, Summerfield AQ. The lateralization of simple dichotic pitches. J Acoust Soc Am 108: 316–334, 2000. doi: 10.1121/1.429467. [DOI] [PubMed] [Google Scholar]
- 45. Yin TC, Chan JC, Irvine DR. Effects of interaural time delays of noise stimuli on low-frequency cells in the cat's inferior colliculus. I. Responses to wideband noise. J Neurophysiol 55: 280–300, 1986. doi: 10.1152/jn.1986.55.2.280. [DOI] [PubMed] [Google Scholar]
- 46. Joris PX. Interaural time sensitivity dominated by cochlea-induced envelope patterns. J Neurosci 23: 6345–6350, 2003. doi: 10.1523/JNEUROSCI.23-15-06345.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Carney LH. A model for the responses of low-frequency auditory-nerve fibers in cat. J Acoust Soc Am 93: 401–417, 1993. doi: 10.1121/1.405620. [DOI] [PubMed] [Google Scholar]
- 48. McAlpine D, Jiang D, Palmer AR. A neural code for low-frequency sound localization in mammals. Nat Neurosci 4: 396–401, 2001. doi: 10.1038/86049. [DOI] [PubMed] [Google Scholar]
- 49. Devore S, Ihlefeld A, Hancock K, Shinn-Cunningham B, Delgutte B. Accurate sound localization in reverberant environments is mediated by robust encoding of spatial cues in the auditory midbrain. Neuron 62: 123–134, 2009. doi: 10.1016/j.neuron.2009.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Green DM. Detection of multiple component signals in noise. J Acoust Soc Am 30: 904–911, 1958. doi: 10.1121/1.1909400. [DOI] [Google Scholar]
- 51. Frijns JH, Raatgever J, Bilsen FA. A central spectrum theory of binaural processing. The binaural edge pitch revisited. J Acoust Soc Am 80: 442–451, 1986. doi: 10.1121/1.394040. [DOI] [PubMed] [Google Scholar]
- 52. Malmierca MS, Izquierdo MA, Cristaudo S, Hernández O, Pérez-González D, Covey E, Oliver DL. A discontinuous tonotopic organization in the inferior colliculus of the rat. J Neurosci 28: 4767–4776, 2008. doi: 10.1523/JNEUROSCI.0238-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Merzenich MM, Reid MD. Representation of the cochlea within the inferior colliculus of the cat. Brain Res 77: 397–415, 1974. doi: 10.1016/0006-8993(74)90630-1. [DOI] [PubMed] [Google Scholar]
- 54. Culling JF. The existence region of Huggins' pitch. Hear Res 127: 143–148, 1999. doi: 10.1016/s0378-5955(98)00193-2. [DOI] [PubMed] [Google Scholar]
- 55. Mc Laughlin M, Chabwine JN, van der Heijden M, Joris PX. Comparison of bandwidths in the inferior colliculus and the auditory nerve. II: Measurement using a temporally manipulated stimulus. J Neurophysiol 100: 2312–2327, 2008. doi: 10.1152/jn.90252.2008. [DOI] [PubMed] [Google Scholar]
- 56. Agapiou JP, McAlpine D. Low-frequency envelope sensitivity produces asymmetric binaural tuning curves. J Neurophysiol 100: 2381–2396, 2008. doi: 10.1152/jn.90393.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Devore S, Delgutte B. Effects of reverberation on the directional sensitivity of auditory neurons across the tonotopic axis: influences of interaural time and level differences. J Neurosci 30: 7826–7837, 2010. doi: 10.1523/JNEUROSCI.5517-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Maier JK, Hehrmann P, Harper NS, Klump GM, Pressnitzer D, McAlpine D. Adaptive coding is constrained to midline locations in a spatial listening task. J Neurophysiol 108: 1856–1868, 2012. doi: 10.1152/jn.00652.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Jiang D, McAlpine D, Palmer AR. Detectability index measures of binaural masking level difference across populations of inferior colliculus neurons. J Neurosci 17: 9331–9339, 1997. doi: 10.1523/JNEUROSCI.17-23-09331.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Jiang D, McAlpine D, Palmer AR. Responses of neurons in the inferior colliculus to binaural masking level difference stimuli measured by rate-versus-level functions. J Neurophysiol 77: 3085–3106, 1997. doi: 10.1152/jn.1997.77.6.3085. [DOI] [PubMed] [Google Scholar]
- 61. Shera CA, Charaziak KK. Cochlear frequency tuning and otoacoustic emissions. Cold Spring Harb Perspect Med 9: a033498, 2019. doi: 10.1101/cshperspect.a033498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Young ED, Sachs MB. Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. J Acoust Soc Am 66: 1381–1403, 1979. doi: 10.1121/1.383532. [DOI] [PubMed] [Google Scholar]
- 63. Kuwada S, Batra R, Stanford TR. Monaural and binaural response properties of neurons in the inferior colliculus of the rabbit: effects of sodium pentobarbital. J Neurophysiol 61: 269–282, 1989. doi: 10.1152/jn.1989.61.2.269. [DOI] [PubMed] [Google Scholar]
- 64. Chung Y, Hancock KE, Nam SI, Delgutte B. Coding of electric pulse trains presented through cochlear implants in the auditory midbrain of awake rabbit: comparison with anesthetized preparations. J Neurosci 34: 218–231, 2014. doi: 10.1523/JNEUROSCI.2084-13.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Devore S. Neural Correlates and Mechanisms of Sound Localization in Everyday Reverberant Settings (PhD thesis). Cambridge, MA: Massachusetts Institute of Technology, 2009. [Google Scholar]
- 66. Ryan A, Miller J. Effects of behavioral performance on single-unit firing patterns in inferior colliculus of the rhesus monkey. J Neurophysiol 40: 943–956, 1977. doi: 10.1152/jn.1977.40.4.943. [DOI] [PubMed] [Google Scholar]
- 67. Shaheen LA, Slee SJ, David SV. Task engagement improves neural discriminability in the auditory midbrain of the marmoset monkey. J Neurosci 41: 284–297, 2021. doi: 10.1523/JNEUROSCI.1112-20.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Rocchi F, Ramachandran R. Foreground stimuli and task engagement enhance neuronal adaptation to background noise in the inferior colliculus of macaques. J Neurophysiol 124: 1315–1326, 2020. doi: 10.1152/jn.00153.2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Cariani PA, Colburn HS, Hartmann WM. Temporal model of edge pitch effects (Abstract). J Acoust Soc Am 137: 2204, 2015. doi: 10.1121/1.4920017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Hartmann WM, Cariani PA, Colburn HS. Noise edge pitch and models of pitch perception. J Acoust Soc Am 145: 1993, 2019. doi: 10.1121/1.5093546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Joris PX, Schreiner CE, Rees A. Neural processing of amplitude-modulated sounds. Physiol Rev 84: 541–577, 2004. doi: 10.1152/physrev.00029.2003. [DOI] [PubMed] [Google Scholar]
- 72. Liu LF, Palmer AR, Wallace MN. Phase-locked responses to pure tones in the inferior colliculus. J Neurophysiol 95: 1926–1935, 2006. doi: 10.1152/jn.00497.2005. [DOI] [PubMed] [Google Scholar]
- 73. Harper NS, McAlpine D. Optimal neural population coding of an auditory spatial cue. Nature 430: 682–686, 2004. doi: 10.1038/nature02768. [DOI] [PubMed] [Google Scholar]
- 74. Harper NS, Scott BH, Semple MN, McAlpine D. The neural code for auditory space depends on sound frequency and head size in an optimal manner. PLoS One 9: e108154, 2014. doi: 10.1371/journal.pone.0108154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Grothe B, Pecka M. The natural history of sound localization in mammals—a story of neuronal inhibition. Front Neural Circuits 8: 116, 2014. doi: 10.3389/fncir.2014.00116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Schnupp JW, Carr CE. On hearing with more than one ear: lessons from evolution. Nat Neurosci 12: 692–697, 2009. doi: 10.1038/nn.2325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Wakeford OS, Robinson DE. Detection of binaurally masked tones by the cat. J Acoust Soc Am 56: 952–956, 1974. doi: 10.1121/1.1903354. [DOI] [PubMed] [Google Scholar]
- 78. Hoppe SA, Langford TL. Binaural interaction in cat and man. I. Signal detection and noise cross correlation. J Acoust Soc Am 55: 1263–1265, 1974. doi: 10.1121/1.1914695. [DOI] [PubMed] [Google Scholar]
- 79. Zheng L, Early SJ, Mason CR, Idrobo F, Harrison JM, Carney LH. Binaural detection with narrowband and wideband reproducible noise maskers: II. Results for rabbit. J Acoust Soc Am 111: 346–356, 2002. doi: 10.1121/1.1423930. [DOI] [PubMed] [Google Scholar]
- 80. Houben D, Gourevitch G. Auditory lateralization in monkeys: an examination of two cues serving directional hearing. J Acoust Soc Am 66: 1057–1063, 1979. doi: 10.1121/1.383377. [DOI] [PubMed] [Google Scholar]
- 81. Ebert CS Jr, Blanks DA, Patel MR, Coffey CS, Marshall AF, Fitzpatrick DC. Behavioral sensitivity to interaural time differences in the rabbit. Hear Res 235: 134–142, 2008. doi: 10.1016/j.heares.2007.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Keating P, Nodal FR, Gananandan K, Schulz AL, King AJ. Behavioral sensitivity to broadband binaural localization cues in the ferret. J Assoc Res Otolaryngol 14: 561–572, 2013. doi: 10.1007/s10162-013-0390-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Tolnai S, Beutelmann R, Klump GM. Interaction of interaural cues and their contribution to the lateralisation of Mongolian gerbils (Meriones unguiculatus). J Comp Physiol A Neuroethol Sens Neural Behav Physiol 204: 435–448, 2018. doi: 10.1007/s00359-018-1253-5. [DOI] [PubMed] [Google Scholar]
- 84. Li K, Chan CH, Rajendran VG, Meng Q, Rosskothen-Kuhl N, Schnupp JW. Microsecond sensitivity to envelope interaural time differences in rats. J Acoust Soc Am 145: EL341, 2019. doi: 10.1121/1.5099164. [DOI] [PubMed] [Google Scholar]
- 85. Fay RR. Perception of pitch by goldfish. Hear Res 205: 7–20, 2005. doi: 10.1016/j.heares.2005.02.006. [DOI] [PubMed] [Google Scholar]
- 86. Shofner WP. Perception of the missing fundamental by chinchillas in the presence of low-pass masking noise. J Assoc Res Otolaryngol 12: 101–112, 2011. doi: 10.1007/s10162-010-0237-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Shofner WP. Perception of the periodicity strength of complex sounds by the chinchilla. Hear Res 173: 69–81, 2002. doi: 10.1016/s0378-5955(02)00612-3. [DOI] [PubMed] [Google Scholar]
- 88. Plack CJ, Turgeon M, Lancaster S, Carlyon RP, Gockel HE. Frequency discrimination duration effects for Huggins pitch and narrowband noise (L). J Acoust Soc Am 129: 1–4, 2011. doi: 10.1121/1.3518745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Parker AJ, Newsome WT. Sense and the single neuron: probing the physiology of perception. Annu Rev Neurosci 21: 227–277, 1998. doi: 10.1146/annurev.neuro.21.1.227. [DOI] [PubMed] [Google Scholar]
- 90. Akeroyd MA, Moore BC, Moore GA. Melody recognition using three types of dichotic-pitch stimulus. J Acoust Soc Am 110: 1498–1504, 2001. doi: 10.1121/1.1390336. [DOI] [PubMed] [Google Scholar]
- 91. Shackleton TM, Skottun BC, Arnott RH, Palmer AR. Interaural time difference discrimination thresholds for single neurons in the inferior colliculus of guinea pigs. J Neurosci 23: 716–724, 2003. doi: 10.1523/JNEUROSCI.23-02-00716.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Skottun BC, Shackleton TM, Arnott RH, Palmer AR. The ability of inferior colliculus neurons to signal differences in interaural delay. Proc Natl Acad Sci USA 98: 14050–14054, 2001. doi: 10.1073/pnas.241513998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Shackleton TM, Palmer AR. Contributions of intrinsic neural and stimulus variance to binaural sensitivity. J Assoc Res Otolaryngol 7: 425–442, 2006. doi: 10.1007/s10162-006-0054-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Durlach NI. Note on the creation of pitch through binaural interaction. J Acoust Soc Am 34: 1096–1099, 1962. doi: 10.1121/1.1918251. [DOI] [Google Scholar]
- 95. Culling JF, Marshall DH, Summerfield AQ. Dichotic pitches as illusions of binaural unmasking. II. The Fourcin pitch and the dichotic repetition pitch. J Acoust Soc Am 103: 3527–3539, 1998. doi: 10.1121/1.423060. [DOI] [PubMed] [Google Scholar]
- 96. Jeffress LA. A place theory of sound localization. J Comp Physiol Psychol 41: 35–39, 1948. doi: 10.1037/h0061495. [DOI] [PubMed] [Google Scholar]
- 97. Stern RM Jr, Colburn HS. Theory of binaural interaction based in auditory-nerve data. IV. A model for subjective lateral position. J Acoust Soc Am 64: 127–140, 1978. doi: 10.1121/1.381978. [DOI] [PubMed] [Google Scholar]
- 98. Stern RM, Zeiberg AS, Trahiotis C. Lateralization of complex binaural stimuli: a weighted-image model. J Acoust Soc Am 84: 156–165, 1988. doi: 10.1121/1.396982. [DOI] [PubMed] [Google Scholar]
- 99. Lesica NA, Lingner A, Grothe B. Population coding of interaural time differences in gerbils and barn owls. J Neurosci 30: 11696–11702, 2010. doi: 10.1523/JNEUROSCI.0846-10.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100. Averbeck BB, Latham PE, Pouget A. Neural correlations, population coding and computation. Nat Rev Neurosci 7: 358–366, 2006. doi: 10.1038/nrn1888. [DOI] [PubMed] [Google Scholar]
- 101. Eyherabide HG, Samengo I. When and why noise correlations are important in neural decoding. J Neurosci 33: 17921–17936, 2013. doi: 10.1523/JNEUROSCI.0357-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102. Colburn HS, Carney LH, Heinz MG. Quantifying the information in auditory-nerve responses for level discrimination. J Assoc Res Otolaryngol 4: 294–311, 2003. doi: 10.1007/s10162-002-1090-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103. Culling JF, Summerfield Q. Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. J Acoust Soc Am 98: 785–797, 1995. doi: 10.1121/1.413571. [DOI] [PubMed] [Google Scholar]
- 104. Grossberg S. Contour enhancement, short term memory, and constancies in reverberating neural networks. In: Studies of Mind and Brain. Dordrecht, The Netherlands: Springer; 1982, p. 332–378. [Google Scholar]
- 105. Oster M, Douglas R, Liu SC. Computation with spikes in a winner-take-all network. Neural Comput 21: 2437–2465, 2009. doi: 10.1162/neco.2009.07-08-829. [DOI] [PubMed] [Google Scholar]
- 106. Guttman N. Pitch and loudness of a binaural subjective tone (Abstract). J Acoust Soc Am 34: 1962, 1996. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available upon reasonable request.





