Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 May 1.
Published in final edited form as: J Comp Psychol. 2012 Sep 17;127(2):142–153. doi: 10.1037/a0029734

Processing pitch in a non-human mammal (Chinchilla laniger)

William P Shofner 1, Megan Chaney 1
PMCID: PMC3764596  NIHMSID: NIHMS507355  PMID: 22985274

Abstract

Whether the mechanisms giving rise to pitch reflect spectral or temporal processing has long been debated. Generally, sounds having strong harmonic structures in their spectra have strong periodicities in their temporal structures. We found that when a wideband harmonic tone complex is passed through a noise vocoder, the resulting sound can have a harmonic structure with a large peak-to-valley ratio, but with little or no periodicity in the temporal structure. To test the role of harmonic structure in pitch perception for a non-human mammal, we measured behavioral responses to noise-vocoded tone complexes in chinchillas using a stimulus generalization paradigm. Animals discriminated either a harmonic tone complex or an iterated rippled noise from a 1-channel vocoded version of the tone complex. When tested with vocoded versions generated with 8, 16, 32, 64 and 128 channels, responses were similar to those of the 1-channel version. Behavioral responses could not be accounted for based on harmonic peak-to-valley ratio as the acoustic cue, but could be accounted for based on temporal properties of the autocorrelation functions such as periodicity strength or the height of the first peak. The results suggest that pitch perception does not arise through spectral processing in non-human mammals, but rather through temporal processing. The conclusion that spectral processing contributes little to pitch in non-human mammals may reflect broader cochlear tuning than that described in humans.

Introduction

Pitch is one of the most fundamental auditory perceptions. In speech perception, pitch plays a role in prosody (Ohala, 1983), whereas variations in pitch provide linguistic meaning to words in tonal languages (Lee, Vakoch, & Wurm, 1996; Cutler & Chen, 1997; Ye & Connine, 1999). Pitch also plays a role in gender identification (Gelfer & Mikos, 2005) and may provide attention cues for infant-directed speech (Gauthier & Shi, 2011). In music perception, pitch is essential for melody recognition (Dowling & Fujitani, 1971; Cousineau, Demany, & Pressnitzer, 2009), and impairments in melody recognition in amusic listeners are related to deficits in pitch perception (Ayotte, Peretz, & Hyde, 2002; Peretz et al., 2002). Pitch is also thought to play a role in the ability of human listeners to group and segregate sound sources (Darwin, 2005).

Whether pitch is processed through spectral or temporal mechanisms is an issue still debated today more than 160 years after the original dispute between Seebeck and Ohm (Turner, 1977). Models utilizing either spectral only (Wightman, 1973; Goldstein, 1973; Cohen, Grossberg, & Wyse, 1995; Shamma & Klein, 2000; McLachlan, 2011) or temporal only processing (Licklider, 1951; Yost & Hill, 1979; Meddis & O’Mard, 1997; de Cheveigne, 1998; Balaguer-Ballester, Denham, & Meddis, 2008) can account for a wide variety of pitch percepts, but have been unable to eliminate conclusively one mechanism over the other. The debate continues largely because it is difficult to alter the spectral structure of a sound without altering its temporal structure. Sounds showing strong harmonic structures in their spectra generally show strong periodicities in their autocorrelation functions (ACFs). For example, compare the spectra and ACFs for a harmonic tone complex and an iterated rippled noise (Figure 1). Both of these sounds evoke a pitch of 500 Hz in human listeners. Note that the peak-to-valley ratios of the harmonics in the spectrum for the tone complex (Figure 1A) are larger than those of the iterated rippled noise (Figure 1B), and that the heights and numbers of peaks in the ACF are also larger for the tone complex (Figure 1C) than for the rippled noise (Figure 1D). An ideal approach would be to study pitch perception utilizing sounds in which spectral and temporal properties can be manipulated independently. However, this ideal approach is unrealistic since the spectrum and ACF are Fourier pairs and thus by definition cannot be manipulated independently.

Figure 1.

Figure 1

Example spectra (A, B) and autocorrelation functions (C, D) illustrate harmonic structure and periodicity for a harmonic tone complex and infinitely iterated rippled noise which both evoke a 500 Hz pitch. Vertical lines indicate the frequencies of the harmonics. (A). Spectrum of a wideband harmonic tone complex comprised of a fundamental frequency of 500 Hz and successive harmonics up to an including the 20th harmonic. (B). Spectrum of an infinitely iterated rippled noise having a delay of 2 ms and a delayed noise attenuation of −1 dB (see Methods). The spectrum has been smoothed using a 7 bin moving window average. (C). Autocorrelation function for the harmonic tone complex. (D). Autocorrelation function for the infinitely iterated rippled noise. The gray shaded areas indicate the frequencies corresponding to harmonics 1–10. The horizontal dashed lines indicate 2 standard deviations.

A more realistic approach is to manipulate various acoustic features of complex sounds and then compare behavioral performance to predictions based on certain assumptions regarding spectral or temporal processing. Examples in the literature include the use of high-pass filtering to eliminate resolved harmonic components (Houtsma & Smurzynski, 1990; Patterson, Handel, Yost, & Datta, 1996), the use of different delay-and-add networks to generate different types of iterated rippled noises (Yost, Patterson, & Sheft, 1996; Yost, 1996), and the use of transposed tones (Oxenham, Bernstein, & Penagos, 2004). Another interesting example is the use of sinusoidally-amplitude modulated noise (SAM-noise). SAM-noise has no harmonic structure in its spectrum and shows no periodicity in its waveform ACF, but it does contain temporal modulations in the envelope which can evoke pitch percepts (Burns & Viemeister, 1976; 1981). SAM-noise is a stimulus having a temporal structure, but no harmonic structure, and thus it can be argued that any pitch percept evoked by SAM-noise must arise through temporal processing.

In the present study, we show that by passing a wideband harmonic tone complex (wHTC) through a noise vocoder the resulting sound can have a harmonic structure with little or no temporal structure. The vocoder or voice encoder was originally a device developed at Bell Telephone Laboratories for the analysis and resynthesis of speech (Dudley, 1939). A noise vocoder analyses a sound, such as speech, by first passing the sound through a fixed number of contiguous bandpass filters and extracting the envelope from each filter or channel. The envelopes from each channel are then used to modulate a fixed number of contiguous bandpass noises. The modulated bandpass noises are then summed to yield the resynthesized sound. Recently, vocoders have been used to degrade the features of sounds for studies in speech perception (Shannon et al., 1995; Dorman, Loizou, & Rainey, 1997; Shannon et al., 1998; Loizou, Dorman, & Tu, 1999; Friesen et al., 2001; Loebach & Pisoni, 2008), melody recognition and pitch discrimination (Smith, Delgutte, & Oxenham, 2002; Green, Faulkner, & Rosen, 2002; Qin & Oxenham, 2005), and the recognition of environmental sounds (Loebach & Pisoni, 2008; Shafiro, 2008). We found that vocoded versions of a wHTC can have harmonic structures with large peak-to-valley ratios in the spectra, but with little or no periodicity strength in the ACFs. In some respects, noise-vocoded wHTCs are the antithesis of SAM-noises: SAM-noises have no harmonic structures but do have temporal structures, whereas noise-vocoded wHTCs have harmonic structures, but have little or no temporal structures.

Pitch perception is not a distinctively human characteristic as pitch-like percepts are common across mammalian species. Chinchillas have a perceptual dimension corresponding to pitch (Shofner, Yost, & Whitmer, 2007), including a perception of the missing fundamental (Shofner, 2011). In addition, the audiogram (Heffner & Heffner, 1991) and the spectral dominance region for pitch (Shofner & Yost, 1997) of chinchillas are both similar to those of human listeners. We used noise-vocoded wHTCs to investigate the role of spectral processing in pitch perception for the chinchilla. First, we were interested in gaining some insight as to which processing scheme, spectral or temporal, reflects the more primitive state from an evolutionary perspective. Second, because neurophysiological studies related to pitch are based on animal models (e.g. Winter, Wiegrebe, & Patterson, 2001; Langner, Albert, & Briede, 2002; Bendor & Wang, 2005; Shofner, 2008), we were interested in understanding pitch processing for a non-human mammal, particularly in light of possible differences in cochlear tuning between humans and non-human mammals (Shera, Guinan, & Oxenham, 2002, 2010; Oxenham & Shera, 2003; Joris et al., 2011). Although these differences are controversial (Ruggero & Temchin, 2005; Siegel et al., 2005; Eustaquio-Martin & Lopez-Poveda, 2011), it is predicted that broader cochlear tuning in a non-human mammal will degrade spectral processing.

Pitch perception in chinchillas was studied using a stimulus generalization paradigm. In a stimulus generalization paradigm, an animal is presented with a standard stimulus and is trained to respond to a signal stimulus. Stimuli vary systematically along one or more stimulus dimensions (Malott & Malott, 1970), and a systematic change in behavioral response along the physical dimension of the stimulus is known as a generalization gradient. In the present study, standard stimuli were broadband noises, signal stimuli were either wHTCs or IRNs, and test stimuli consisted of noise-vocoded wHTCs. Test stimuli that evoke similar behavioral responses as either the signal or the standard indicate a similarity (Guttman, 1963) or perceptual equivalence (Hulse, 1995) between the stimuli. If pitch strength or saliency is based on spectral peak-to-valley ratio then noise-vocoded wHTCs having strong harmonic structures but no temporal structures should evoke similar behavioral responses as the wHTC or IRN signals, and as the number of analysis/resynthesis channels decreases, a systematic decrease in behavioral responses should occur as the harmonic structure degrades.

METHODS

The procedures used were approved by the Institutional Animal Care and Use Committee for the Bloomington Campus of Indiana University.

Subjects

Four adult, male chinchillas (Chinchilla laniger) served as subjects in these experiments. Each chinchilla (c12, c15, c24, c36) had previous experience in the behavioral paradigm and served as subjects in a missing-fundamental study (Shofner, 2011). Chinchillas c12, c15, and c24 began testing at the same time and were tested in all four experiments; c36 was added to the study at a later time and was tested in the first two experiments only. Given the small variability in responses between the three chinchillas tested in Experiments 3 and 4, it was not necessary to include c36 in these conditions. Chinchillas received food pellet rewards during behavioral testing, and their body weights were maintained between 80–90% of their normal weight. All animals were in good health during the period of data collection.

Acoustic Stimuli

Stimulus presentation and data acquisition were under the control of a Gateway computer and Tucker-Davis Technologies System II modules. Stimuli were played through a D/A converter at conversion rate of 50 kHz and low-pass filtered at 15 kHz. The output of the low pass filter was amplified, attenuated and played through a loudspeaker. All stimuli had durations of 500 ms with 10-ms rise/fall times. The sound pressure level was determined by placing a condenser microphone at the approximate position of an animal’s head and measuring the A-weighted sound pressure level with a sound spectrum analyzer (Ivie IE-33); sound level was fixed at 73 dBA for all sounds. The frequency response of the system varied ± 8 dB from 100 Hz to 10000 Hz (Fig. 2).

Figure 2.

Figure 2

Frequency response of the acoustic system measured using a 40 μsec click. The condenser microphone was placed in the approximate position of an animal’s head and the output of the loudspeaker was analyzed by measuring the C-weighted sound level for 219 frequencies (xs) between 21 −19,957 Hz with an IVIE-33 spectrum analyzer. The horizontal lines show that the relative frequency response varies ± 8 dB over a frequency range of 100–10,000 Hz.

Wideband harmonic tone complexes (wHTCs) were generated on a digital array processor at a sampling rate of 50 kHz and stored as 16-bit integer files. Tone complexes were comprised of a fundamental frequency (F0) of either 250 Hz or 500 Hz and each successive harmonic up to 10 kHz. Harmonics were of equal amplitude and added in sine-starting phase. Noise-vocoded versions of the wHTCs were generated using Tiger CIS version 1.05.02 developed by Qian-Jie Fu (http://tigerspeech.com). The original 16-bit integer files were first converted to wav files with a sampling rate of 44.1 kHz using Cool Edit 2000 and then processed through the vocoder. The vocoder first analyzed the wHTC through a series of bandpass filters from 200–7000 Hz. Default filter slopes of 24 dB/octave and center frequencies based on the Greenwood function were used. The number of channels varied from 1–128, and the carrier type of the vocoder was set to white noise. The present study used noise-vocoded wHTCs based on 1, 8, 16, 32, 64 and 128 channels. The envelope from each channel was extracted using a lowpass cut-off frequency of 160 Hz for the 500 Hz F0 wHTC and 80 Hz for the 250 Hz F0 wHTC. The vocoded versions of the wHTCs were saved as wav files and reconverted to 16-bit integer files at a sampling rate of 50 kHz using Cool Edit 2000.

Details of the generation of infinitely-iterated rippled noises (IIRN) and cosine noise (CosN) have previously been described (Shofner, Whitmer, & Yost, 2005). CosN was generated by delaying a wideband noise and adding the unattenuated, delayed noise to the original version of the noise. Thus, CosN is generated by applying the delay-and-add process once. For IIRN, the delayed version of the noise was attenuated (−1 dB or −4 dB) and then added (+) to the original noise through a positive feedback loop. Adding the delayed noise through a positive feedback loop is equivalent mathematically to applying the delay-and-add process through an infinite number of iterations. IIRN stimuli will be referred to as IIRN[+, d, dB atten] where d is the delay and dB atten is the delayed noise attenuation. Delays were fixed at 2 ms (500 Hz pitch) or 4 ms (250 Hz pitch). For each rippled noise, 5 secs of the waveform was sampled at 50 kHz and stored as 16-bit integer files. A random 500-ms sample was extracted from the 5-sec stimulus files to use for presentation in a block of trials during the behavioral testing session.

Spectra for 1-, 64-, and 128-channel noise-vocoded 500 Hz F0 wHTCs are illustrated in Figure 3A-C. Note the 1-channel version is essentially a broadband noise. Both the 64-channel and the 128-channel noise-vocoded wHTCs show strong harmonic structures over harmonics 1–10, but the peak-to-valley ratios appear smaller than that of the wHTC (Figure 1A). The peak-to valley ratios used in the present study were estimated from the acoustic spectra. A condenser microphone was placed in the approximate position of an animal’s head and the output of the loudspeaker was displayed as a spectrum on an IVIE-33 spectrum analyzer for 219 frequencies between 21 −19,957 Hz using the C-weight sound level. Harmonic structure was quantified by computing the average peak-to-valley ratio for harmonics 1–10 as

Figure 3.

Figure 3

Example spectra (A–C) and autocorrelation functions (D–F) illustrate harmonic structure and periodicity for noise-vocoded versions of the 500 Hz harmonic tone complex. The number of analysis/resynthesis channels used in the vocoding process are 1 (A, D), 64 (B, E), and 128 (C, F). The gray shaded areas indicate the frequencies corresponding to harmonics 1–10. Vertical lines indicate the frequencies of the harmonics. The horizontal dashed lines indicate 2 standard deviations. Spectra have been smoothed using a 7 bin moving window average.

Ave.P-V(dB)=dB1-dB1.5+dB1.5-dB2++dB10-dB10.519 (Eq. 1)

where Ave.P-V(dB) is the average peak-to-valley ratio expressed as a decibel and dBN are the spectral amplitudes at the specified harmonic number, N. The heights and number of peaks in the ACF of the vocoded wHTCs (Figure 3D-F) are reduced relative to those of the wHTC (Figure 1C), indicating that the periodicity strengths of the vocoded wHTCs are less than that of the wHTC. Periodicity strength was quantified from the first 20 ms of the ACF and was based on the sum of crest factors of individual peaks. The standard deviation of the ACF was first computed over the time lag between 0.04–20 ms. A peak in the ACF was considered significant if the height was greater than 2 standard deviations. Periodicity strength (PS) was defined as

PS=i=1nACiσACF (Eq. 2)

where ACi is the height of a significant peak, σACF is the standard deviation of the ACF, and n is the number of significant peaks in the ACF between 0.04–20 ms. Periodicity strength was defined in this manner in order to have an estimate based on both the heights and number of peaks. If no significant peaks in the ACF were found, then periodicity strength was 0. Table 1 summarizes the average peak-to-valley ratios and periodicity strengths for stimuli used in the present study.

Table 1.

Average spectral peak-to-valley ratios (P-V (dB)) for harmonics 1–10; periodicity strengths normalized to the periodicity strength for the wHTC (PS re: PS(wHTC)) for the first 20 ms of the ACFs; and height of the first peak in the autocorrelation function (AC1) for the 500 Hz F0 and 250 Hz F0 stimuli used in the present study for predictions of behavioral responses.

500 Hz F0 500 Hz F0 500 Hz F0 250 Hz F0 250 Hz F0 250 Hz F0
Stimulus P-V (dB) PS re: PS(wHTC) AC1 P-V (dB) PS re: PS(wHTC) AC1
wHTC 46.2 1.00 0.994 41.6 1.00 0.988
Filtered wHTC 49.4 0.85 0.995 44 0.85 0.989
IIRN (−1 dB) 17.6 0.63 0.785 23.3 0.74 0.825
IIRN (−4 dB) 10.5 0.44 0.544 12.2 0.65 0.571
IIRN + HPN 12.5 (16.2) 0.60 0.384
CosN 12.8 (15.1) 0.42 0.485
128 channels 38.7 0.20 0.725 27.3 0.13 0.35
64 channels 25.7 0.09 0.328 20 0.00 0.074
32 channels 15.8 (23.5) 0.04 0.112 11.9 0.00 0.039
16 channels 7.1 0.00 0.042 4.6 0.00 0.03
8 channels 2.9 0.00 0.0007 3.6 0.00 0.026
1 channel 3.1 (1.7) 0.00 0.0013 2.3 0.00 0.023

wHTC: wideband harmonic tone complex

Filtered wHTC: wideband harmonic tone complex bandpass filtered from 200–7000 Hz

IIRN (−1 dB): infinitely iterated ripped noise where delayed noise attenuation was −1 dB

IIRN (−4 dB): infinitely iterated ripped noise where delayed noise attenuation was −4 dB

IIRN + HPN: IIRN −1 dB added to high pass noise having a 3 kHz upper cut off frequency

CosN: cosine noise (i.e. one delay-and-add iteration of rippled noise)

128 channels – 1 channel: noise vocoded versions of the wHTCs with specified number of channels

Numbers in parentheses indicate the average peak-to-valley ratios for harmonics 1–5 only.

Behavioral Procedure

The testing cage had dimensions of 24″ width x 24″ length x 14″ high; animals were free to roam around the cage and were not restrained in any way. The cage was located on a card table in a single-walled sound attenuating chamber having internal dimensions of 5′ 2″ width x 5′ 2″ length x 6′ 6″ high. A pellet dispenser was located at one end of the cage with a reward chute attached to a response lever. A loudspeaker was located next to the pellet dispenser approximately 30° to the right of center at a distance of 6″ in front of the chinchilla.

The training procedures (Shofner, 2002) and stimulus generalization procedures (Shofner, 2002; 2011; Shofner & Whitmer, 2006; Shofner, Whitmer, & Yost, 2005; 2007) have been described previously in detail. Briefly, a standard sound was presented continually in 500-ms bursts at a rate of once per second, regardless of whether or not a trial was initiated. The standard was a 1-channel noise-vocoded version of a wHTC. Chinchillas initiated a trial by pressing down on the response lever. The lever had to be depressed for a duration that varied randomly for each trial ranging from 1.15–8.15 seconds for 3 chinchillas and from 1.15–6.15 seconds for the 4th chinchilla. After the lever was depressed for the required duration, two 500-ms bursts of a selected sound were presented for that trial. The response window was coincident with the duration of the two 500-ms bursts (2000 ms), except that the response window began 150 ms after the onset of the first burst and lasted until the onset of the next burst of the continual standard stimulus. Thus, the actual duration of the response window was 1850 ms.

The selected sounds presented during the response window could be signals, test sounds, or standards. A signal trial consisted of two bursts of the signal sound, which were either wHTCs or IIRNs depending on the specific experiment. If the animal released the lever during the response window of a signal trial, then this positive response was treated as a hit and was rewarded with a food pellet. A standard trial consisted of two additional bursts of the 1-channel noise-vocoded wHTC. If the animal continued to depress the lever throughout the response window of a standard trial, then this negative response was treated as a correct rejection. Food pellet rewards for correct rejections were generally not necessary to reinforce correct rejections. A lever release during a standard trial was treated as a false alarm. Because false alarm rates were well below 20%, “time outs” following a false alarm were not necessary. A test trial consisted of two bursts of a test sound which were generally 8–128 channel noise-vocoded versions of the wHTCs, although rippled noises were occasionally used as well. Chinchillas did not receive food pellet rewards for responses to test stimuli, regardless of whether the behavioral response was positive or negative.

Chinchillas were tested in blocks consisting of 40 trials. Two different test sounds were presented in a block of trials. In each block, 60% of the trials were signal trials, 20% were standard trials, 10% were test sound #1 trials, and 10% were test sound #2 trials. Thus, test stimuli were presented infrequently in the block of trials. Behavioral responses were considered to be under stimulus control if the percent correct for the discrimination of the signal from the standard was at least 81% for each block. Responses were collected for a minimum of 50 blocks (i.e. 2000 total trials) resulting in at least 200 trials for each test sound.

RESULTS

Experiment 1: wHTCs as Signals

Chinchillas were trained to discriminate a wHTC from a 1-channel noise-vocoded version of the wHTC. Behavioral responses are the number of lever releases relative to the number of trials expressed as a percent (Figure 4). The responses obtained from 4 chinchillas to the 500 Hz F0 wHTC signal are high ranging from 92–97%, whereas the responses to the 1-channel noise-vocoded version of the wHTC are low ranging from 2–11% (Figure 4A). That is, chinchillas can discriminate the wHTC from the 1 channel noise vocoded version; d’s estimated as the differences between z(hits) and z(false alarms) range from 2.7 – 4. When tested with vocoded versions of the wHTC using 8–128 channels, the responses of the chinchillas are low and are similar to those of the 1-channel version (Figure 4A & B). None of the animals tested gave large responses to the 128-channel noise-vocoded wHTC in spite of its large spectral peak-to-valley ratio which is closer to that of the wHTC than to the peak-to-valley ratio of 1-channel version (Table 1). A repeated-measures analysis of variance showed a significant effect of stimuli for the 500 Hz condition (F = 355.2; p ≪0.0001). Pairwise comparisons based on Tukey’s test showed there was not a significant difference between the 1-channel and the 128-channel versions (q = 3.22; p > 0.05), but there was a significant difference between the 1-channel version and the unmodified wHTC (q = 41.8; p ≪ 0.001). In addition to the vocoding process, the software for the noise vocoder also bandpass filters the stimulus from 200–7000 Hz. In order to test whether the bandpass filtering itself affected the behavioral responses, chinchillas were also tested with a filtered version of the wHTC that was not subjected to the vocoding process. Pairwise comparisons based on Tukey’s test showed there was not a significant difference between the filtered wHTC and unmodified wHTC (q = 0.14; p > 0.05), but there was a significant difference between the 1-channel version and the filtered wHTC (q = 41.7; p ≪ 0.001). . Similar behavioral responses were obtained when the F0 was 250 Hz (Figure 4C & D). A repeated-measures analysis of variance showed a significant effect of stimuli for the 250 Hz condition (F = 434.5; p ≪ 0.0001). Pairwise comparisons based on Tukey’s test showed there was not a significant difference between the 1-channel and the 128-channel versions (q = 4.00; p > 0.05), and there was not a significant difference between the filtered wHTC and unmodified wHTC (q = −0.14; p > 0.05). There were significant differences between the 1-channel version and the unmodified wHTC (q = 46.1; p ≪ 0.001) as well as the 1-channel version and the filtered wHTC (q = 46.3; p ≪ 0.001).

Figure 4.

Figure 4

Behavioral responses obtained from the stimulus generalization task for 4 chinchillas using the unmodified wHTC as the signal. (A). Responses obtained from individual chinchillas (c12, c15, c24, c36) are shown. Fundamental frequency of the wHTC is 500 Hz. (B). Averaged behavioral responses from the 4 chinchillas for the 500 Hz condition (black solid squares and line). Error bars indicate the 95% confidence intervals based on Tukey’s standard error. The black dashed line and opened circles show the predictions based on the average peak-to-valley ratios (P-V) from the spectra. The gray solid line and gray diamonds show the predictions based on the periodicity strengths (PS) from the ACFs. The gray dashed line and open squares show the predictions based on AC1. (C). Responses obtained from individual chinchillas (c12, c15, c24, c36) are shown. Fundamental frequency of the wHTC is 250 Hz. (D) Same as B for the 250 Hz fundamental frequency condition. In all panels, numbers along the x-axis indicate the number of analysis/resynthesis channels of the vocoded sounds; filt and HTC indicate the filtered only and unmodified harmonic tone complexes, respectively. The form of the predictions are (test-1 channel)/(HTC −1 channel).

These results suggest that the filtered wHTC is generalized to the wHTC, whereas the 8–128 channel noise-vocoded versions of the wHTC are generalized to the 1-channel version of the wHTC. Since the 1-channel version is a broadband noise, then the 8–128 channel versions of the wHTC are essentially generalized to wideband noise. If spectral peak-to-valley ratio is the acoustic cue controlling the behavioral responses, then a systematic decrease in behavioral response should be observed as the number of channels decreases (dashed black line and open circles in Figure 4B &D). Clearly, the behavioral and predicted functions differ greatly. It should be noted that for these and all subsequent predictions based on the acoustic features obtained from the stimuli, namely peak-to-valley ratio and periodicity strength (Table 1), it is assumed that the behavioral responses are proportional to the acoustic measures. However, for the predictions based on the height of the first peak in the autocorrelation function (AC1), it is assumed that the responses are proportional to 10+10(2AC1) (Yost, 1996). Given that the responses to the 8–128 channel versions are similar to the response for the 1-channel version, it suggests that these stimuli all share a common acoustic feature, such as little or no periodicity strength (Table 1). The behavioral responses obtained can be accounted for if the animals were using periodicity strength as the acoustic cue (gray solid line and gray diamonds in Figure 4B & D). Note that the predictions based on AC1 (gray dashed line and open squares in Figure 4B & D) do not account for the behavioral responses as well as periodicity strength. The sum of squares deviation between the data and predictions for the 500 Hz condition is 290.3 when periodicity strength is used, but is 541.7 when AC1 is used. For the 250 Hz condition, these values are 392.4 and 401.6, respectively. These findings suggest that the behavioral responses are not controlled by the spectral structure, but may be controlled by the temporal structure.

Experiment 2: IIRNs as Signals

In order to further test the hypothesis that the behavioral responses were not controlled by the spectral peak-to-valley ratio, wHTC signals were replaced with IIRN[+, d, −1 dB]. These IIRNs have average peak-to-valley ratios that are smaller than the 128-channel vocoded wHTCs (Table 1) for both 500 Hz and 250 Hz F0s. Figure 5 shows responses obtained from 4 chinchillas to vocoded wHTCs when IIRN[+, d, −1 dB] was used as the signal for delays of 2 ms and 4 ms. The behavioral responses to the IIRN[2 ms, −1 dB] (Figure 5A & B) and IIRN[4 ms, −1 dB] (Figure 5C & D) are high ranging from 94–98%, whereas the responses to the 1-channel noise vocoded version of the wHTC are low ranging from 1.5–11%. That is, chinchillas can discriminate these IIRNs from the 1-channel noise vocoded versions with d’s ranging from 2.9 – 3.8. When tested with vocoded versions of the wHTCs using 8–128 channels, the responses of the chinchillas are low and again are similar to those of the 1-channel version. It should be noted that for the 2 ms condition, the responses of two of the four animals are around 30–44% to the 128-channel noise vocoded wHTC and are above those of the 1-channel version (Figure 5A), whereas the responses of the other two chinchillas to the 128-channel version are essentially identical to those of the 1-channel version for the 2 ms condition. For the 4 ms condition, all of the animals gave low responses to the 128-channel vocoded HTC that are similar to the 1-channel version (Figure 5C). A repeated-measures analysis of variance showed a significant effect of stimuli for the 2 ms delay condition (F = 125.3; p ≪ 0.0001) and for the 4 ms delay condition (F = 1112.5; p ≪ 0.001). Pairwise comparisons between the 1-channel and 128-channel versions showed no significant difference for the 2 ms condition (q = 4.55; p > 0.05) and for the 4 ms condition (q = 0.54; p > 0.05), but there were significant differences between the 1-channel version and the signal IIRN with −1 dB for both the 2 ms (q = 25.2; p < 0.001) and 4 ms conditions (q = 72.6; p ≪ 0.001). Animals were also tested with IIRNs having −4 dB delayed noise attenuation. Pairwise comparisons based on Tukey’s test showed there was no significant difference between the responses to the test IIRNs with −4 dB and the responses to the signal IIRNs with −1 dB for both 2 ms (q = 0.18; p > 0.05) and 4 ms delay conditions (q = 1.85; p > 0.05). Pairwise comparisons showed that there was a significant difference between responses to the test IIRNs with −4 dB and those obtained for the 128-channel vocoded versions for the 2 ms (q = 20.5; p < 0.001) and 4 ms conditions (q = 71.3; p ≪ 0.001).

Figure 5.

Figure 5

Behavioral responses obtained from the stimulus generalization task for 4 chinchillas using the IIRN[+, d, −1 dB] as the signal. (A). Responses obtained from individual chinchillas (c12, c15, c24, c36) are shown. The delay of the IIRN is 2 ms. (B). Averaged behavioral responses from the 4 chinchillas for the 2 ms condition (black solid squares and line). Error bars indicate the 95% confidence intervals based on Tukey’s standard error. The black dashed line and opened circles show the predictions based on the average peak-to-valley ratios (P-V) from the spectra. The gray solid line and gray diamonds show the predictions based on the periodicity strengths (PS) from the ACFs. The gray dashed line and open squares show the predictions based on AC1. (C). Responses obtained from individual chinchillas (c12, c15, c24, c36) are shown. The delay of the IIRN is 4 ms. (D) Same as B for the 4 ms delay condition. In all panels, numbers along the x-axis indicate the number of analysis/resynthesis channels of the vocoded sounds; iirn4 and iirn1 indicate the infinitely-iterated rippled noises having delayed noise attenuations of −4 dB and −1 dB, respectively. The form of the predictions are (test-1 channel)/(iirn1-1 channel).

These results suggest that the 8–128 channel noise-vocoded versions of the wHTCs are not generalized to the IIRN[+, d, −1 dB] for both 2 ms and 4 ms delays, but rather are generalized to the 1-channel versions of the wHTCs. For the 2 ms condition, the 128-channel version is generalized to be intermediate between the IIRN signal and the 1-channel version of the wHTC (Figure 5A) for two of the four chinchillas, which may be related to the weak periodicity strength of 0.2 for the 128-channel version. For the 4 ms condition, the 128-channel version is generalized to the 1-channel version by all four chinchillas tested. The predicted responses that would be obtained if the animals were using the average peak-to-valley ratio over harmonics 1–10 for the 2 ms and 4 ms conditions do not account for the obtained behavioral responses (black dashed line and open circles in Figure 5). Also, the responses to the test IIRNs with −4 dB are larger than those obtained for either the 128-channel or 64-channel vocoded wHTCs in spite of the larger peak-to-valley ratios of these two vocoded wHTCs (Table 1). The behavioral responses can be accounted for if the animals were using periodicity strength as the acoustic cue (gray solid line and gray diamonds in Figure 5B & D) and again it should be noted that the predictions based on AC1 do not account for the behavioral responses as well as periodicity strength. The sum of squares deviation between the data and predictions for the 2 ms delay condition is 889.7 when periodicity strength is used, but is 7095.3 when AC1 is used. For the 4 ms delay condition, these values are 354.4 and 4312.3, respectively. These results again suggest that the behavioral responses are controlled by temporal structure rather than spectral structure.

Experiment 3: Testing Spectral Shape

As the number of analysis/resynthesis channels decreases, there is a change in the overall spectral shape of the vocoded wHTC such that there can be a strong harmonic structure at low frequencies, but at higher frequencies, the spectrum shows characteristics of a flat-spectrum noise (compare Figure 6A with Figure 3C). In order to test the influence that overall spectral shape may have on controlling the behavioral responses, IIRN[+, 2 ms, −1 dB] was added to a highpass noise (HPN) having a low cut off frequency of 3 kHz. This produced a sound having a harmonic structure at low frequencies and a flatter spectrum at higher frequencies (Figure 6B) similar to the 32-channel vocoded wHTC (Figure 6A). Three chinchillas discriminated the IIRN+HPN signal from the 1-channel noise-vocoded wHTC standard. Animals were tested with the 32-channel noise-vocoded version as well as with CosN (Figure 6C). For this experiment, the signal and both test sounds all have similar average peak-to-valley ratios (Table 1). Figure 7 shows responses obtained from 3 chinchillas to the 32-channel vocoded wHTC and CosN when IIRN+HPN was the signal. A repeated-measures analysis of variance showed a significant effect of stimuli (F = 320.5; p ≪ 0.0001). Pairwise comparisons based on Tukey’s test showed there was not a significant difference between responses for the 1-channel and 32-channel versions (q = 0.32; p > 0.05) or between CosN and IIRN+HPN (q = 0.40; p > 0.05). There was a significant difference between the responses for the 32-channel version and CosN (q = 31.4; p < 0.001).

Figure 6.

Figure 6

Spectra for the 32-channel noise-vocoded wHTC (A) and IIRN[+, 2 ms, −1 dB] added to a highpass noise having a lower cutoff frequency of 3000 Hz (B) and CosN (C). The gray shaded areas indicate the frequencies corresponding to harmonics 1–5. Spectra have been smoothed using a 7 bin moving window average. Vertical lines indicate the frequencies of the harmonics. The delays used to generate the IIRN and CosN was 2 ms.

Figure 7.

Figure 7

Behavioral responses from 3 chinchillas using the stimuli illustrated in Figure 6. (A). Responses obtained from individual chinchillas (c12, c15, c24, c36) are shown. (B). Averaged behavioral responses from the 4 chinchillas for the 2 ms condition (black solid squares and line). Error bars indicate the 95% confidence intervals based on Tukey’s standard error. The black dashed line and opened circles show the predictions based on the average peak-to-valley ratios (P-V) from the spectra. The gray solid line and gray diamonds show the predictions based on the periodicity strengths (PS) from the ACFs. The gray dashed line and open squares show the predictions based on AC1. The form of the predictions are (test-1 channel)/(IIRN+HPN - 1 channel).

If average peak-to-valley ratio is the acoustic cue that controls the behavioral responses, then it would be expected that the chinchillas would generalize across all three of these stimuli (black dashed line and open circles in Figure 7B). Clearly Figure 7B and the above statistical analysis show that the animals did not respond equally to the test sounds and the signal. The responses obtained suggest that average peak-to-valley ratio does not control the behavioral response (Figure 7B). It is interesting to note that if behavioral performance reflected the average peak-to-valley ratios for only harmonics 1–5 where clear peaks in the spectra are observed (Figure 6A-B), then the responses to the 32-channel noise-vocoded wHTC should be high given that the peak-to-valley ratio for this stimulus is larger than either the IIRN+HPN or CosN stimuli (values in parentheses in Table 1). If overall spectral shape is the acoustic cue, then it would be expected that the 32-channel vocoded wHTC would be generalized to the IIRN+HPN, but the CosN would not be generalized to the IIRN+HPN, because the spectrum of CosN shows ripples at all harmonics and does not show the characteristics of a flat-spectrum noise above 3 kHz. In contrast to these predictions, the behavioral responses show that the responses to the 32-channel noise-vocoded wHTC are generalized to the 1-channel standard and not to the IIRN+HPN signal even though the 32-channel version and the IIRN+HPN have similar overall spectral shapes. Also, the responses to the CosN are generalized to the IIRN+HPN signal, again a result not predicted by spectral shape.

The IIRN+HPN and CosN both have larger periodicity strengths than either the 1-channel or 32-channel versions of the wHTC (Table 1). The behavioral responses obtained can be accounted for if the animals are using periodicity strength as the acoustic cue (gray solid line and gray diamonds in Figure 7B). It is interesting to note that in this case behavioral responses can be better accounted for if the animals are using AC1 as the acoustic cue (gray dashed line and open squares in Figure 7B). The sum of squares deviation between the data and predictions is 742.6 when periodicity strength is used, but is 196.4 when AC1 is used. These results again suggest that the behavioral responses are controlled by temporal structure rather than spectral structure.

Experiment 4: Discrimination of 128-channel from 1-channel

The results of Experiments 1–2 indicate that the 128-channel noise-vocoded versions of wHTCs are generalized to broadband noise in the chinchilla. Stimuli are generalized (i.e. perceptually equivalent) when animals cannot discriminate between the stimuli (Gutman & Kalish, 1956). Given the large spectral differences between the 128-channel and 1-channel noise-vocoded wHTCs (Table 1), can chinchillas discriminate between these two stimuli? Three chinchillas were tested in a discrimination task in which animals now received food pellet rewards for positive responses to the 128-channel version (i.e. hits). In blocks of 40 trials, the 128-channel version of the 500 Hz F0 wHTC was presented on 32 trials and the 1-channel version on 8 trials. Hits and false alarms were measured for each block, and the d’s were estimated for each block and over a total of 1200 trials (30 blocks) for two chinchillas and 2120 trials (53 blocks) for a third animal (Figure 8). If the p(hits) or p(false alarms) were 1.0 or 0.0 for a block, then the probability was adjusted as 1-1/(2N) or 1/(2N), respectively, where N is the number of signal or blank trials (Macmillan & Creelman, 2005). One animal could discriminate the stimuli at a threshold level of performance giving a d’ of 0.98 (c24 in Figure 8), while two animals could not discriminate between the stimuli giving d’s of 0.19 and 0.33 (c12 and c15, respectively, in Figure 8). There was no evidence of an improvement in performance over the number of trials completed.

Figure 8.

Figure 8

Performance of 3 chinchillas (c12, c15, c24) obtained for discriminating the 1-channel noise-vocoded wHTC from the 128-channel version. The fundamental frequency of the unmodified wHTC used for the vocoding process was 500 Hz. The d’ for each block of 40 trials was measured. The d’ shown in each panel is based on the hits and false alarms computed over all blocks.

DISCUSSION

The simplest spectral processing scheme for pitch perception is that the pitch percepts of HTCs are determined by the fundamental frequency component, either by its physical presence in the stimulus or by its reintroduction to the peripheral representation through cochlear distortion products. Similar to human listeners, non-human mammals appear to have pitch-like perceptions of the “missing fundamental” (Heffner & Whitfield, 1976; Chung & Colavita, 1976; Tomlinson & Schwarz, 1988; Shofner, 2011) that do not arise through the reintroduction of cochlear distortion products (Chung & Colavita, 1976; Shofner, 2011). Thus, this simplest form of spectral processing does not appear to exist in the auditory system of non-human mammals. An alternative spectral processing scheme would be that the pitch percepts are based on representations of the harmonic spectra within the auditory system (i.e. harmonic template matching). The results of the present study argue against the harmonic template model for the chinchilla. In the present study, chinchillas discriminated either a wHTC or IIRN (i.e. salient “pitch” percept) from a 1-channel noise-vocoded wHTC (i.e. noise percept). When tested with 8–128 channel noise-vocoded wHTCs the behavioral responses obtained are not related to the peak-to-valley ratios of the harmonic components of the test stimuli. The results suggest that an auditory representation of the harmonic structure is not controlling the behavioral responses and that spectral processing contributes little to the perception of pitch in chinchillas.

Why is harmonic structure apparently not being processed by the chinchilla auditory system? Consider the representation of the harmonic components along the basilar membrane. Figure 9 illustrates the positions of two harmonics, H1 and H2, along the basilar membrane; each component falls within an idealized auditory filter that is centered at the frequency of the harmonic. The auditory filter bandwidth is given as equivalent rectangular spread (ERS) which expresses frequency in terms of position along the basilar membrane (Shera et al., 2010). Resolvability of the harmonics is expressed in terms of position along the basilar membrane and is defined as the difference between the upper cutoff frequency for the filter centered at H1 (XUH1) and the lower cutoff frequency for the filter centered at H2 (XLH2). If this difference is greater than zero, then the two auditory filters do not overlap in position along the basilar membrane and the harmonics are resolved. If the difference is less than zero, then there is overlap of the filters and the harmonics are not resolved. The ERS for various center frequencies are available (Figure 16A of Shera et al., 2010) and position along the basilar membrane can be estimated from the Greenwood frequency-position functions (Greenwood, 1990). Resolvability decreases as harmonic number increases for both chinchilla and human cochleae (Figure 9). For humans, harmonics 1–10 are resolved, whereas only harmonics 1–2 are resolved in chinchillas. In addition, harmonics 1–2 are clearly farther apart in the human cochlea than in the chinchilla cochlea; that is, resolvability of harmonics 1–2 is better in humans than in chinchillas. The difference between resolvability in humans and chinchillas reflects the sharper cochlear tuning described in humans (Shera et al., 2002; 2010; Oxenham & Shera, 2003; Joris et al., 2011). Thus, although there can be a large peak-to-valley ratio between harmonics for noise-vocoded wHTCs, the representation of that harmonic structure is highly degraded in the chinchilla cochlea.

Figure 9.

Figure 9

Illustration of auditory filters defining resolvability of harmonic components, H1 and H2 along the basilar membrane. The harmonics each fall within separate auditory filters having bandwidths expressed as equivalent rectangular spread (ERS). The upper (XUH1) and lower (XLH2) cutoff frequencies of the two filters are shown by the equations. Resolvability is defined as the difference between these cutoff frequencies. The graph shows resolvability of harmonics 1–10 of a wHTC for chinchillas (black line and circles) and humans (gray line and diamonds). Filled symbols indicate harmonics for which resolvability > 0.

The behavioral responses to the vocoded wHTCs do appear to be related to the temporal structure of the ACFs suggesting that pitch perception in chinchillas is largely based on temporal processing. Periodicity strength was a better predictor of the behavioral responses for conditions in which the F0 of the wHTC was 500 Hz, but was about equal to the AC1 predictions when the F0 of the wHTC was 250 Hz (Experiment 1). For Experiment 2, periodicity strength was the better predictor for conditions using IIRNs for both 2 ms and 4 ms delays, but AC1 was the better predictor in Experiment 3 when the delay was 2 ms and the IIRN signal was combined with the HPN. Thus, while the predictions based on periodicity strength or AC1 are mixed, they argue that an auditory representation based on temporal structure is controlling the behavioral responses in chinchillas. This conclusion is consistent with those of recent studies in gerbils (Klinge & Klump, 2009; 2010) in which these authors argue that the detection of mistuned harmonics arises largely through temporal processing as a consequence of a reduction in spatial selectivity along the shorter cochlea of the gerbil (Klinge, Itatani, & Klump, 2010). However, as previously noted, the spectrum and ACF are Fourier pairs, and consequently, any temporal processing scheme has a potentially viable, complementary spectral processing scheme. Although we argue in favor of temporal processing, we acknowledge that the behavioral responses obtained from the chinchillas could be based on an auditory representation of the harmonic structure that is highly degraded due to broader cochlear tuning. Moreover, any temporal representation is likely to be converted into a place code in the central auditory system (e.g. Langner et al., 2002; Bendor & Wang, 2005).

The analysis described above for resolvability suggests there may indeed be a greater role for spectral processing in humans than in non-human mammals, and it will be important to obtain behavioral responses to vocoded wHTCs in human listeners to clarify this issue. For example, since different complex sounds can evoke differences in pitch strength or saliency (Fastl & Stoll, 1979; Yost, 1996; Shofner & Selas, 2002) then the pitch strength of vocoded wHTCs should decrease systematically as the number of analysis/synthesis channels used by the vocoder is reduced if pitch is extracted through spectral processing. Moreover, it is interesting to note that Experiment 4 demonstrated poor performance in chinchillas for discriminating the 128-channel noise-vocoded wHTC from the 1-channel version (i.e. d’ ≤ 1.0), but casual listening suggests that these stimuli are easily discriminated. Performance was measured for 200 trials by the first author and p(hits) and p(false alarms) obtained were 1.0 and 0.0, respectively (i.e. infinite d’).

In conclusion, we argue that temporal processing can account for behavioral responses in chinchillas, suggesting that temporal processing of pitch is the more primitive processing mechanism common across mammalian species, including humans. In order to account for the behavioral responses based on spectral processing, then it must be assumed that cochlear tuning in non-human mammals is broader than tuning in humans as suggested by the resolvability analysis previously described. Our results along with those of Klinge et al. (2010) imply that spectral processing of pitch in non-human mammals is presumably less developed due to broader cochlear tuning and shorter cochleae, and thus, pitch perception in non-human mammals must arise largely through temporal processing. As the modern human cochlea lengthened and tuning sharpened through evolution, the foundation was laid for additional neural mechanisms of pitch extraction in the spectral domain to evolve.

Acknowledgments

This research was supported in part by a grant from the National Institutes of Health (R01 DC 005596). The authors would like to express their appreciation to the three reviewers whose comments greatly improved the manuscript.

References

  1. Ayotte J, Peretz I, Hyde K. Congenital amusia: A group study of adults afflicted with a music-specific disorder. Brain. 2002;125:238–251. doi: 10.1093/brain/awf028. [DOI] [PubMed] [Google Scholar]
  2. Balaguer-Ballester E, Denham SL, Meddis R. A cascade autocorrelation model of pitch perception. Journal of the Acoustical Society of America. 2008;124:2186–2195. doi: 10.1121/1.2967829. [DOI] [PubMed] [Google Scholar]
  3. Bendor D, Wang X. The neuronal representation of pitch in primate auditory cortex. Nature. 2005;436:1161–1165. doi: 10.1038/nature03867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Burns EM, Viemeister NF. Nonspectral pitch. Journal of the Acoustical Society of America. 1976;60:863–869. [Google Scholar]
  5. Burns EM, Viemeister NF. Played-again SAM: Further observations on the pitch of amplitude-modulated noise. Journal of the Acoustical Society of America. 1981;70:1655–1660. [Google Scholar]
  6. Chung DY, Colavita FB. Periodicity pitch perception and its upper frequency limit in cats. Perception & Psychophysics. 1976;20:433–437. [Google Scholar]
  7. Cohen MA, Grossberg S, Wyse LL. A spectral network model of pitch perception. Journal of the Acoustical Society of America. 1995;98:862–879. doi: 10.1121/1.413512. [DOI] [PubMed] [Google Scholar]
  8. Cousineau M, Demany L, Pressnitzer D. What makes a melody: The perceptual singularity of pitch sequences. Journal Acoustical Society of America. 2009;26:3179–3187. doi: 10.1121/1.3257206. [DOI] [PubMed] [Google Scholar]
  9. Cutler A, Chen HC. Lexical tone in Cantonese spoken-word processing. Perception & Psychophysics. 1997;59:165–179. doi: 10.3758/bf03211886. [DOI] [PubMed] [Google Scholar]
  10. Darwin CJ. Pitch and auditory grouping. In: Plack CJ, Oxenham AJ, Fay RR, Popper AN, editors. Pitch: Neural coding and perception. New York: Springer; 2005. pp. 278–305. [Google Scholar]
  11. de Cheveigne A. Cancellation model of pitch perception. Journal of the Acoustical Society of America. 1998;103:1261–1271. doi: 10.1121/1.423232. [DOI] [PubMed] [Google Scholar]
  12. Dowling WJ, Fujitani DS. Contour, interval, and pitch recognition in memory for melodies. Journal of the Acoustical Society of America. 1971;49:524–531. doi: 10.1121/1.1912382. [DOI] [PubMed] [Google Scholar]
  13. Dorman MF, Loizou PC, Rainey D. Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. Journal Acoustical Society America. 1997;102:2403–2411. doi: 10.1121/1.419603. [DOI] [PubMed] [Google Scholar]
  14. Dudley H. Remaking speech. Journal of the Acoustical Society of America. 1939;11:169–177. [Google Scholar]
  15. Eustaquio-Martin A, Lopez-Poveda E. Isoresponse versus isoinput estimates of cochlear tuning. Journal of the Association for Research in Otolaryngology. 2011;12:281–299. doi: 10.1007/s10162-010-0252-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Fastl H, Stoll G. Scaling of pitch strength. Hearing Research. 1979;1:293–301. doi: 10.1016/0378-5955(79)90002-9. [DOI] [PubMed] [Google Scholar]
  17. Friesen L, Shannon RV, Baskent D, Wang X. Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic hearing and cochlear implants. Journal of the Acoustical Society of America. 2001;110:1150–1163. doi: 10.1121/1.1381538. [DOI] [PubMed] [Google Scholar]
  18. Gauthier B, Shi R. A connectionist study on the role of pitch in infant-directed speech. Journal of the Acoustical Society of America. 2011;130:EL380–EL386. doi: 10.1121/1.3653546. [DOI] [PubMed] [Google Scholar]
  19. Gelfer MP, Mikos VA. The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels. Journal of Voice. 2005;19:544–554. doi: 10.1016/j.jvoice.2004.10.006. [DOI] [PubMed] [Google Scholar]
  20. Goldstein JL. An optimum processor theory for the central formation of the pitch of complex tones. Journal of the Acoustical Society of America. 1973;54:1496–1516. doi: 10.1121/1.1914448. [DOI] [PubMed] [Google Scholar]
  21. Green T, Faulkner A, Rosen S. Spectral and temporal cues to pitch in noise-excited vocoder simulations of continuous-interleaved-sampling cochlear implants. Journal of the Acoustical Society of America. 2002;112:2155–2164. doi: 10.1121/1.1506688. [DOI] [PubMed] [Google Scholar]
  22. Greenwood DD. A cochlear frequency-position function for several species-29 years later. Journal of the Acoustical Society of America. 1990;87:2592–2605. doi: 10.1121/1.399052. [DOI] [PubMed] [Google Scholar]
  23. Guttman N. Laws of behavior and facts of perception. In: Koch S, editor. Psychology: A study of a science: Vol. 5. The process areas, the person, and some applied fields: Their place in psychology and in science. New York: McGraw-Hill; 1963. pp. 114–178. [Google Scholar]
  24. Gutman N, Kalish HI. Discriminability and stimulus generalization. Journal of Experimental Psychology. 1956;51:79–88. doi: 10.1037/h0046219. [DOI] [PubMed] [Google Scholar]
  25. Heffner RS, Heffner HE. Behavioral hearing range of the chinchilla. Hearing Research. 1991;52:13–16. doi: 10.1016/0378-5955(91)90183-a. [DOI] [PubMed] [Google Scholar]
  26. Heffner H, Whitfield IC. Perception of the missing fundamental by cats. Journal of the Acoustical Society of America. 1976;59:915–919. doi: 10.1121/1.380951. [DOI] [PubMed] [Google Scholar]
  27. Houtsma AJM, Smurzynski J. Pitch identification and discrimination for complex tones with many harmonics. Journal of the Acoustical Society of America. 1990;87:304–310. [Google Scholar]
  28. Hulse SH. The discrimination-transfer procedure for studying auditory perception and perceptual invariance in animals. In: Klump GM, Dooling RJ, Fay RR, Stebbins WC, editors. Methods in comparative psychoacoustics. Basel, Switzerland: Birkhauser-Verlag; 1995. pp. 319–330. [Google Scholar]
  29. Joris PX, Bergevin C, Kalluri R, McLaughlin M, Michelet P, van der Heijden M, Shera CA. Frequency selectivity in Old-World monkeys corroborates sharp cochlear tuning in humans. Proceedings of the National Academy of Sciences USA. 2011;108:17516–17520. doi: 10.1073/pnas.1105867108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Klinge A, Klump GM. Frequency difference limens of pure tones and harmonics within complex stimuli in Mongolian gerbils and humans. Journal of the Acoustical Society of America. 2009;125:304–314. doi: 10.1121/1.3021315. [DOI] [PubMed] [Google Scholar]
  31. Klinge A, Klump GM. Mistuning detection and onset asynchrony in harmonic complexes in Mongolian gerbils. Journal of the Acoustical Society of America. 2010;128:280–290. doi: 10.1121/1.3436552. [DOI] [PubMed] [Google Scholar]
  32. Klinge A, Itatani N, Klump GM. A comparative view on the perception of mistuning: Constraints of the auditory periphery. In: Lopez-Poveda EA, Palmer AR, Meddis R, editors. The neurophysiological basis of auditory perception. New York: Springer; 2010. pp. 465–475. [Google Scholar]
  33. Langner G, Albert M, Briede T. Temporal and spatial coding of periodicity information in the inferior colliculus of awake chinchilla (Chinchilla laniger) Hearing Research. 2002;168:110–130. doi: 10.1016/s0378-5955(02)00367-2. [DOI] [PubMed] [Google Scholar]
  34. Lee YS, Vakoch DA, Wurm LH. Tone perception in Cantonese and Mandarin: A cross-linguistic comparison. Journal of Psycholinguistic Research. 1996;25:527–542. doi: 10.1007/BF01758181. [DOI] [PubMed] [Google Scholar]
  35. Licklider JCR. A duplex theory of pitch perception. Experientia. 1951;7:128–134. doi: 10.1007/BF02156143. [DOI] [PubMed] [Google Scholar]
  36. Loebach JL, Pisoni DB. Perceptual learning of spectrally degraded speech and environmental sounds. Journal of the Acoustical Society of America. 2008;123:1126–1139. doi: 10.1121/1.2823453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Loizou PC, Dorman M, Tu Z. On the number of channels needed to understand speech. Journal of the Acoustical Society of America. 1999;106:2097–2103. doi: 10.1121/1.427954. [DOI] [PubMed] [Google Scholar]
  38. Malott RW, Malott MK. Perception and stimulus generalization. In: Stebbins WC, editor. Animal psychophysics: The design and conduct of sensory experiments. New York: Appleton-Century-Crofts; 1970. pp. 363–400. [Google Scholar]
  39. McLachlan N. A neurocognitive model of recognition and pitch segregation. Journal of the Acoustical Society of America. 2011;130:2845–2854. doi: 10.1121/1.3643082. [DOI] [PubMed] [Google Scholar]
  40. Macmillan NA, Creelman CD. Detection theory. A user’s guide. 2. Mahwah, NJ: Lawrence Erlbaum Associates; 2005. [Google Scholar]
  41. Meddis R, O’Mard L. A unitary model of pitch perception. Journal of the Acoustical Society of America. 1997;102:1811–1820. doi: 10.1121/1.420088. [DOI] [PubMed] [Google Scholar]
  42. Ohala JJ. Cross-language use of pitch: An ethological view. Phonetica. 1983;40:1–18. doi: 10.1159/000261678. [DOI] [PubMed] [Google Scholar]
  43. Oxenham AJ, Bernstein JGW, Penagos H. Correct tonotopic representation is necessary for complex pitch perception. Proceedings of the National Academy of Sciences USA. 2004;101:1421–1425. doi: 10.1073/pnas.0306958101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Oxenham AJ, Shera CA. Estimates of human cochlear tuning at low levels using forward and simultaneous masking. Journal of the Association for Research in Otolaryngology. 2003;4:541–554. doi: 10.1007/s10162-002-3058-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Patterson RD, Handel S, Yost WA, Datta AJ. The relative strength of the tone and noise componenets in iterated rippled noise. Journal of the Acoustical Society of America. 1996;100:3286–3295. [Google Scholar]
  46. Peretz I, Ayotte J, Zatorre RJ, Mehler J, Ahad P, Penhune VB, Jutras B. Congenital amusia: A disorder of fine-grain pitch discrimination. Neuron. 2002;33:185–191. doi: 10.1016/s0896-6273(01)00580-3. [DOI] [PubMed] [Google Scholar]
  47. Qin MK, Oxenham AJ. Effects of envelope-vocoder processing on F0 discrimination and concurrent-vowel identification. Ear & Hearing. 2005;26:451–460. doi: 10.1097/01.aud.0000179689.79868.06. [DOI] [PubMed] [Google Scholar]
  48. Ruggero MA, Temchin AN. Unexceptional sharpness of frequency tuning in the human cochlea. Proceedings of the National Academy of Sciences USA. 2005;102:18614–18619. doi: 10.1073/pnas.0509323102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Shafiro V. Identification of environmental sounds with varying spectral resolution. Ear & Hearing. 2008;29:401–420. doi: 10.1097/AUD.0b013e31816a0cf1. [DOI] [PubMed] [Google Scholar]
  50. Shamma S, Klein D. The case of the missing pitch templates: How harmonic templates emerge in the early auditory system. Journal of the Acoustical Society of America. 2000;107:2631–2644. doi: 10.1121/1.428649. [DOI] [PubMed] [Google Scholar]
  51. Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M. Speech recognition with primarily temporal cues. Science. 1995;270:303–304. doi: 10.1126/science.270.5234.303. [DOI] [PubMed] [Google Scholar]
  52. Shannon RV, Zeng F-G, Kamath V, Wygonski J. Speech recognition with altered spectral distribution of envelope cues. Journal of the Acoustical Society of America. 1998;104:2467–2476. doi: 10.1121/1.423774. [DOI] [PubMed] [Google Scholar]
  53. Shera CA, Guinan JJ, Jr, Oxenham AJ. Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements. Proceedings of the National Academy of Sciences USA. 2002;99:3318–3323. doi: 10.1073/pnas.032675099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Shera CA, Guinan JJ, Jr, Oxenham AJ. Otoacoustic estimation of cochlear tuning: Validation in the chinchilla. Journal of the Association for Research in Otolaryngology. 2010;11:343–365. doi: 10.1007/s10162-010-0217-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Shofner WP. Perception of the periodicity strength of complex sounds by the chinchilla. Hearing Research. 2002;173:69–81. doi: 10.1016/s0378-5955(02)00612-3. [DOI] [PubMed] [Google Scholar]
  56. Shofner WP. Representation of the spectral dominance region of pitch in the steady-state temporal discharge patterns of cochlear nucleus units. Journal of the Acoustical Society of America. 2008;124:3038–3053. doi: 10.1121/1.2981637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Shofner WP. Perception of the missing fundamental by chinchillas in the presence of low-pass masking noise. Journal of the Association for Research in Otolaryngology. 2011;12:101–112. doi: 10.1007/s10162-010-0237-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Shofner WP, Selas G. Pitch strength and Stevens’s power law. Perception & Psychophysics. 2002;64:437–450. doi: 10.3758/bf03194716. [DOI] [PubMed] [Google Scholar]
  59. Shofner WP, Whitmer WM. Pitch cue learning in chinchillas: The role of spectral region in the training stimulus. Journal of the Acoustical Society of America. 2006;120:1706–1712. doi: 10.1121/1.2225969. [DOI] [PubMed] [Google Scholar]
  60. Shofner WP, Yost WA. Discrimination of rippled-spectrum noise from flat-spectrum noise by chinchillas: Evidence for a spectral dominance region. Hearing Research. 1997;110:15–24. doi: 10.1016/s0378-5955(97)00063-4. [DOI] [PubMed] [Google Scholar]
  61. Shofner WP, Whitmer WM, Yost WA. Listening experience with iterated rippled noise alters the perception of ‘pitch’ strength of complex sounds in the chinchilla. Journal of the Acoustical Society of America. 2005;118:3187–3197. doi: 10.1121/1.2049107. [DOI] [PubMed] [Google Scholar]
  62. Shofner WP, Yost WA, Whitmer WM. Pitch perception in chinchillas (Chinchilla laniger): Generalization using rippled noise. Journal of Comparative Psychology. 2007;121:428–439. doi: 10.1037/0735-7036.121.4.428. [DOI] [PubMed] [Google Scholar]
  63. Siegel JH, Cerka AJ, recio-Spinoso A, Temchin AN, van Dijk P, Ruggero MA. Delays of stimulus-frequency otoacoustic emissions and cochlear vibrations contradict the theory of coherent reflection filtering. Journal of the Acoustical Society of America. 2005;118:2434–2443. doi: 10.1121/1.2005867. [DOI] [PubMed] [Google Scholar]
  64. Smith ZM, Delgutte B, Oxenham AJ. Chimaeric sounds reveal dichotomies in auditory perception. Nature. 2002;416:87–90. doi: 10.1038/416087a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Tomlinson RWW, Schwarz DWF. Perception of the missing fundamental in nonhuman primates. Journal of the Acoustical Society of America. 1988;84:560–565. doi: 10.1121/1.396833. [DOI] [PubMed] [Google Scholar]
  66. Turner RS. The Ohm-Seebek dispute, Hermann von Helmholtz, and the origins of physiological acoustics. British Journal for the History of Science. 1977;10:1–24. doi: 10.1017/s0007087400015089. [DOI] [PubMed] [Google Scholar]
  67. Wightman FL. The pattern-transformation model of pitch. Journal of the Acoustical Society of America. 1973;54:407–416. doi: 10.1121/1.1913592. [DOI] [PubMed] [Google Scholar]
  68. Winter IM, Wiegrebe L, Patterson RD. The temporal representation of the delay of iterated rippled noise in the ventral cochlear nucleus of the guinea-pig. Journal of Physiology. 2001;537:553–566. doi: 10.1111/j.1469-7793.2001.00553.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Ye Y, Connine CM. Processing spoken Chinese: The role of tone information. Language & Cognitive Processes. 1999;14:609–630. [Google Scholar]
  70. Yost WA. Pitch strength of iterated rippled noise. Journal of the Acoustical Society of America. 1996;100:3329–3335. doi: 10.1121/1.416973. [DOI] [PubMed] [Google Scholar]
  71. Yost WA, Hill R. Models of the pitch and pitch strength of ripple noise. Journal of the Acoustical Society of America. 1979;66:400–410. [Google Scholar]
  72. Yost WA, Patterson R, Sheft S. A time domain description for the pitch strength of iterated rippled noise. Journal of the Acoustical Society of America. 1996;99:1066–1078. doi: 10.1121/1.414593. [DOI] [PubMed] [Google Scholar]

RESOURCES