Abstract
Adult listeners perceive pitch with fine precision, with many adults capable of discriminating less than a 1 % change in fundamental frequency (F0). Although there is variability across individuals, this precise pitch perception is an ability ascribed to cortical functions that are also important for speech and music perception. Infants display neural immaturity in the auditory cortex, suggesting that pitch discrimination may improve throughout infancy. In two experiments, we tested the limits of F0 (pitch) and spectral centroid (timbre) perception in 66 infants and 31 adults. Contrary to expectations, we found that infants at both 3 and 7 months were able to reliably detect small changes in F0 in the presence of random variations in spectral content, and vice versa, to the extent that their performance matched that of adults with musical training and exceeded that of adults without musical training. The results indicate high fidelity of F0 and spectral-envelope coding in infants, implying that fully mature cortical processing is not necessary for accurate discrimination of these features. The surprising difference in performance between infants and musically untrained adults may reflect a developmental trajectory for learning natural statistical covariations between pitch and timbre that improves coding efficiency but results in degraded performance in adults without musical training when expectations for such covariations are violated.
Keywords: cortical maturation, auditory development, pitch perception, timbre perception
INTRODUCTION
Pitch is a fundamental perceptual attribute of sound that can be ordered on a scale from high to low, and is most closely related to the fundamental frequency (F0) or repetition rate of a stimulus (ANSI 2013). Some adult listeners perceive pitch with extremely fine precision, with the ability to discriminate less than a 1 % change in F0, particularly after training (e.g., Lau et al. 2017b; Micheyl et al. 2006). Studies investigating the neural correlates of pitch in humans and other primates have indicated the involvement of the auditory cortex in the processing of pitch-evoking sounds (Zatorre et al. 1992; Penagos et al. 2004; Bendor and Wang 2005; Norman-Haignere et al. 2013).
Infant pitch perception is interesting in this regard because of the protracted development of the central auditory system, including the auditory cortex (Moore and Guan 2001). Although infants begin responding to sound in the third trimester of gestation (Birnholz and Benacerraf 1983), it is hypothesized that brainstem processing supports early responses to sound because of cortical immaturity (Eggermont and Moore 2012; Lau and Werner 2012). Given the hypothesized cortical involvement in pitch, and its slow developmental trajectory, it may be that infant pitch perception is similarly slow to develop.
Many studies have shown that infants are sensitive to large frequency or F0 changes in both speech and music during the first few months of life. Infants can discriminate the frequency of pure tones (Olsho 1984), the F0 of complex tones (Montgomery and Clarkson 1997; He and Trainor 2009; Lau and Werner 2012, 2014), and musical melodies (Trehub et al. 1985; Plantinga and Trainor 2009; Lau et al. 2017a), as well as lexical tones (Mattock et al. 2008) and F0 contours of syllables and words (Karzon and Nicholas 1989). Although infants perceive pitch, it is unclear whether they are able to discriminate subtle changes with the acuity of adult listeners. For pure tones, infants discriminate changes in frequency as well as adults by 6 months at high frequencies (Olsho 1984) but at low frequencies, discrimination thresholds do not seem to reach adult-like levels until 13–14 years of age (Maxon and Hochberg 1982). Immature pure-tone frequency-discrimination thresholds have also been reported for school-aged children at both high and low frequencies (Buss et al. 2014). However, as Buss et al. point out, children’s performance is likely influenced by non-sensory factors such as memory, sustained attention, and testing method (Buss et al. 2014). Furthermore, with pure tones, it is difficult to determine whether participants are discriminating on the basis of pitch or a change in timbre.
To investigate the influence of cortical maturation on pitch perception of complex tones, we compared F0 discrimination in 3-month-olds, 7-month-olds, and adults. To prevent participants from using the lowest spectral edge or the spectral centroid of the stimulus rather than the F0 (e.g., Houtsma and Smurzynski 1990), random variations in the spectral content were introduced, which resulted in changes in the sounds’ perceived brightness, one dimension of timbre — the perceptual aspect of sound that allows us to distinguish between different instruments (e.g., a piano and a violin) or voices (e.g., male and female) when they produce the same pitch at the same loudness. Infants at 3 and 7 months were tested because the organization of the auditory pathways appears to be distinctly different at these two ages. At 3 months, the auditory cortex is markedly immature, with activation of only the most superficial layer of the cortex by the reticular activating system; by 7 months, thalamocortical connections show increasing axonal conduction velocity, although significant immaturities in the auditory cortex persist (Eggermont and Moore 2012). To the extent that the thalamocortical pathways are required for precise F0 perception, we predicted a trajectory of improving F0 discrimination abilities from 3-month-olds to 7-month-olds to adults.
EXPERIMENT 1: F0 DISCRIMINATION WITH SPECTRAL VARIATION
This experiment tested the limits of infants’ and adults’ ability to detect a change in the F0 (the physical correlate of pitch) within a sequence of complex tones, each containing a random selection of consecutive harmonics, leading to random changes in brightness from tone to tone. In some sequences, a change in the F0 was introduced (“change” trial; Fig. 1A); in others, the F0 remained constant throughout the sequence (“no-change” trial; Fig. 1B). The same observer-based psychophysical procedure (Werner 1995) was used for both infants and adults, so that their discrimination abilities could be compared directly.
Fig. 1.
Schematic diagram of a change (A) and a no-change trial (B) in experiment 1. A At the start of a session, background tones were played repeatedly. Random variation of which harmonics were included in the tone was used to produce changes in brightness but not a change in pitch (blue tones). On change trials, a pitch change was introduced and the changed tones were played four times (green tones; F0 + ΔF0). B On no-change trials, the background tones continued to play throughout the trial.
Method
Participants
Participants were 26 infants, including 13 3-month-olds (7 female and 6 male) and 13 7-month-olds (8 female and 5 male), and 21 adults, including 10 (9 female and 1 male) with little or no musical training and 11 (6 female and 5 male) with 2 or more years of formal musical training, including school band or choir. Adults with musical training can typically discriminate smaller changes in F0 than non-musicians prior to explicit training (Micheyl et al. 2006; Madsen et al. 2019) and were included in order to obtain a full range of adult F0 discrimination abilities. A sample size of 10 per group was determined a priori based on estimates of variability obtained from our previous studies of infant pitch perception (Lau and Werner 2012, 2014; Lau et al. 2017a) and was sufficient to detect a difference between groups when one group reliably reaches criterion performance on the task (80 % or more meet criterion) and another group does not (20 % or fewer meet criterion) with a power (1-ß) of 0.8 and probability (α) of 0.05 (Rosner 1995, p. 384).
All infants were born full term, had no history of otitis media within 2 weeks of testing, had no history of health or developmental concerns, and had passed their newborn hearing screening. Infants in each group completed testing within 10 days of the specified age. Adult participants were between 18 and 30 years of age, reported normal hearing, and had no prior participation in psychoacoustic experiments. Adult participants were required to pass an audiometric screening at 20 dB hearing level at octave frequencies between 250 and 8000 Hz at the time of testing. Data were excluded from five additional infants (4 failed training due to fussiness or inattention, 1 failed tympanometric screen) and five additional adults (4 failed training, 1 failed to complete testing). Adult participants were tested over an average of 4 sessions within a single 1 h visit. Infants were tested over an average of 7 sessions in up to three 1 h visits (mean = 2.3 visits) because they required more breaks. All procedures were conducted according to protocols approved by the Institutional Review Board at the University of Washington (protocol #: 29,813, approved May 17, 2017) and informed consent was obtained from all participants or their legal guardians prior to testing.
Stimuli
The stimuli consisted of sequences of 650 ms harmonic complex tones, including 50 ms linear rise/fall times, separated by 500 ms gaps. Six consecutive harmonics of each complex tone were generated and then individually scaled to produce slopes of 12 dB per octave, spectrally centered on the complex tone, with no flat bandpass region. The components were combined in random phase. The random variation of harmonic numbers on each presentation (with no energy at the F0 itself) was incorporated to limit participants’ ability to respond to spectral changes as opposed to F0 (e.g., Moore and Glasberg 1990; Moore and Moore 2003; Micheyl and Oxenham 2004; Micheyl et al. 2012); the 12 dB per octave slopes were incorporated in order to reduce any salient pitch cues related to the spectral edges of the stimulus (e.g., Kohlrausch & Houtsma, 1992).
The baseline F0 was around 200 Hz, similar to that of an average female voice. Seven differences in F0 (∆F0) were used, ranging from 0 to 5 %. The tones at F0 had lowest harmonic numbers ranging from 3 to 7, and the tones at F0 ± ΔF0 had lowest harmonic numbers ranging from 2 to 6, thereby introducing random variations in brightness. The standard F0 was varied in the range from 195 to 200 Hz between blocks with different ∆F0 values to prevent repeated exposure to the same background tones.
The complex tones were presented at an overall level of 70 dB SPL and were embedded in a 65 dB SPL pink noise (1 to 12,000 Hz) to reduce the audibility of any distortion products. The stimuli were then presented via an Etymotic ER-2 insert earphone to the right ear. Sound pressure levels were calibrated in a Zwislocki coupler and checked in the subject’s ear canal with an Etymotic ER-7 probe microphone system at the start of testing. A 1 kHz pure tone was presented and if the sound pressure level was not measured to be within 2 dB of the intended sound level, the ER-2 insert earphone was removed and replaced in the ear canal. Foam ear tips were shaved to fit infant ear canals, as needed. Sound pressure levels were subsequently rechecked before the start of testing. Testing was conducted in a sound-attenuating booth. Half the participants heard F0 increases (+ ΔF0) while the other half heard F0 decreases (− ΔF0), randomly determined.
Procedure
The task was to listen to the sequence of complex tones and respond when there was an F0 change. Only the participant heard the stimuli, and the experimenter (who was blind to the trial type) had to judge whether a change trial was presented, based solely on the observed behavior of the participant. This observer-based procedure has a long history in both visual (Teller 1979) and auditory (Werner 1995) psychophysics and has been widely used in developmental studies (Lau and Werner 2012; Benasich et al. 2014; Horn et al. 2017).
During testing, infants sat on a caregiver’s lap inside a sound-attenuated booth. An assistant in the booth manipulated toys to keep the infants facing midline. There were two mechanical toys in a dark Plexiglas box and a monitor to the participant’s right. Either the toys were activated or a video was presented to reinforce the infant’s response to a change in F0. The experimenter sat outside the sound booth and observed through a window. The adults in the booth were unable to hear the sounds because the stimuli were presented to the infant via an insert earphone. As an extra precaution, the caregiver and assistant both wore circumaural headphones, with the caregiver listening to music and the assistant listening to the experimenter’s instructions. The experimenter outside the sound booth, who also could not hear the experimental stimuli, wore headphones and a microphone to communicate with the assistant inside the booth.
At the start of testing, background tones with the same F0 were played repeatedly. The experimenter initiated a trial when the participant was quiet and facing midline. To control for response bias, both change trials and no-change trials were presented to calculate hit and false-alarm rates, respectively (Green and Swets 1966). On change trials, four consecutive tones at F0 ± ΔF0 were played before the tones reverted to F0 (Fig. 1A); on no-change trials, F0 tones continued to repeat (Fig. 1B). The experimenter had 4 s from trial onset to decide which trial type had occurred. Typical behaviors used to make judgments included infants’ eye-darts and head-turns towards reinforcers (mechanical toys or video), as well as facial expressions, like eye-widening. Computer feedback was provided to the experimenter after each trial. Adult participants sat alone in the booth and were instructed simply to raise their hand when they heard the change in sound that activated the toys. In all other respects, the stimuli and procedure were the same for the adult and infant participants.
Conditioning and Training
Participants were trained to respond during a conditioning phase with the pairing of a large F0 change (5 % ΔF0) and the activation of mechanical animals or a video. The probability of a change trial was 0.8 and the reinforcer was activated after every change trial regardless of the experimenter’s response. The experimenter had to respond correctly on 4 of 5 consecutive change trials and 1 no-change trial within a maximum of 15 trials to progress to the training phase.
In the training phase, the task and stimuli were the same, but the probability of a change trial was 0.5 and the reinforcer was activated only when the experimenter correctly identified a change trial. Participants were required to achieve a hit rate of at least 80 % and a false-alarm rate of at most 20 % over the last 5 change and no-change trials within 40 trials in order to pass the training phase. If the criterion was not met, the session was discontinued, and participants were given a break. Participants had up to four attempts to reach criterion to progress to the test phases.
After passing the training phase, participants were tested on up to 6 additional phases that presented progressively smaller F0 changes, with the probability of a change trial remaining at 0.5. The values of ∆F0 were 2.5, 1.5, 1, 0.5, 0.25, and 0 % of the F0. In order to progress to the next ∆F0, participants were required to reach the pass criterion of at least an 80 % hit rate with a false-alarm rate of at most 20 % on the last 5 change trials and the last 5 no-change trials, corresponding to a sensitivity (d′) of about 1.68 or better. This equates to correct responses on at least 4 of the last 5 change trials and at least 4 of last 5 no-change trials before a step down was taken. The pass criterion was chosen to depend on both hit rate and false-alarm rate over multiple trials to reduce the effects of any potential bias differences between infants and adults on the outcome of the procedure.
To account for potential inattention, the ∆F0 reverted to the previous (larger) ∆F0 value following four consecutive incorrect responses. If participants responded correctly on 5 of 6 consecutive trials in a maximum of 15 trials at the previous ∆F0, demonstrating that they were again attending to the task, testing resumed at the smaller ∆F0 value. Otherwise, the session was discontinued. Participants had a maximum of four attempts at each ∆F0, at which point testing was discontinued. Statistical analyses of data collected were conducted using SPSS Version 19 and R 4.0.3 with the mosaic package.
Results
To address our initial question of whether 7-month-old infants outperform 3-month-old infants in F0 acuity, we compared the proportion of each group that successfully performed the task at the smallest non-zero F0 difference (0.25 %). As shown in Fig. 2A, there were no clear differences between the two infant groups, with 11 of 13 of both the 3-month-olds and 7-month-olds (~ 85 % of each group; 95 % binomial exact confidence interval, CI [55 %, 98 %]) achieving criterion performance. In contrast, none of the 10 adult participants without musical training (0 %; 95 % binomial exact CI [0 %, 31 %]) achieved criterion performance at this F0 difference. Thus, the proportion of successful infants at the smallest F0 change was significantly greater than the proportion of adults without musical training, χ2(1, N = 23) = 16.22, P < 0.0001, for each infant group. Of the adult group with musical training, 8 of 11 (73 %, 95 % binomial exact CI [39 %, 94 %]) reached criterion. This proportion was significantly higher than that of the musically untrained adults, χ2(1, N = 21) = 11.75, P = 0.001, , but was not significantly different from the proportion for either the 3- or 7-month-old infants, χ2(1, N = 24) = 0.51, P = 0.48, in both cases. As expected, no participants achieved criterion performance at the 0 % F0 difference.
Fig. 2.
A Proportion of participants that reached the pass criterion as a function of F0 change in experiment 1. Different symbols and colors represent the different groups, as shown in the legend. B Mean d′ values (± 1 SEM) at the 0.25 % ΔF0 for the different groups, as indicated by the symbols and colors. The number of participants in each group is shown in parentheses by each symbol.
In a secondary analysis to determine whether there were systematic differences in sensitivity between the groups for those achieving criterion performance at a given F0 difference, we calculated mean d′ values at each value of ∆F0. Of the participants that reached the 0.25 % ∆F0 value, average d′ values for the 3-month-old, 7-month-old, and musician adult groups on the last five change and no-change trials were greater than 1 (3 months: N = 11, mean d′ = 1.84, SEM = 0.11; 7 months: N = 11, mean d′ = 1.85, SEM = 0.12; musicians: N = 8, mean d′ = 1.91, SEM = 0.23), whereas the average d′ value for the musically untrained participants who reached the smallest ∆F0 value was − 0.37 (N = 6, SEM = 0.29), which was not significantly different from zero (one-sample t-test, t(5) = − 1.30, P = 0.25, 95 % CI [− 1.11, 0.36], Cohen’s d = − 0.53), consistent with chance-level performance (Fig. 2B). The four musically untrained adults who did not reach the smallest ∆F0 value failed either at ∆F0 = 2.5 % (N = 2) or at ∆F0 = 0.5 % (N = 2). Similar values of d′ between groups were also found at all other values of ∆F0 (Fig. 3). The fact that sensitivity was similar between groups for participants achieving the criterion level of performance suggests that any differences between the groups, in terms of the number of participants actually achieving the criterion level of performance, reflect true differences in sensitivity rather than simply between-group differences in response bias.
Fig. 3.

Mean d′ as a function of F0 change for participants that reached the pass criterion for each ∆F0. The number of participants that contributed to each average is shown by the color-coded numbers. Non-musician adults who failed to reach criterion are shown in black.
To assess the stability of the results, the proportions of infants and adults reaching criterion at the 0.25 % ∆F0 value were resampled using bootstrapping with replacement to generate 1000 replications. The 95 % CIs using the percentile method were then computed based on the bootstrap sampling distribution. The bootstrapped CI for adults without musical training (95 % percentile bootstrap CI [0 %, 0 %]) did not overlap with the 3-month-olds’ (95 % percentile bootstrap CI [62 %, 100 %]) or the 7-month-olds’ (95 % percentile bootstrap CI [62 %, 100 %]), confirming the significant difference in performance between the adults without musical training and both groups of infants.
EXPERIMENT 2: SPECTRAL CENTROID DISCRIMINATION WITH F0 VARIATION
The results from experiment 1 did not support our initial prediction that 7-month-olds would outperform 3-month-olds in an F0-discrimination task. In fact, the two groups of infants performed similarly and a higher proportion of infants in each group reached the pass criterion at the smallest value of ∆F0 than was found for the musically untrained adults. One possible explanation for this surprising outcome is that the variations in spectrum (due to the randomly selected harmonics in each trial) interfered with adults’, but not infants’, ability to discriminate F0. That might occur because the infants perceived the spectral variations, but the variations did not interfere with their F0 discrimination, or because the infants simply did not perceive the spectral variations. Experiment 2 was designed to distinguish between these two possibilities by testing the limits of infants’ and adults’ ability to perceive changes in the spectral envelope of harmonic complex tones. The task was the same as in experiment 1 except that the parameters were reversed: Participants were required to detect a change in spectral envelope (induced by varying the spectral centroid of the stimulus) while ignoring random and uninformative variations in F0.
Method
Participants
The new participants were 20 3-month-olds (12 female, 8 male), 20 7-month-olds (7 female, 13 male), and 10 musically untrained adults (5 female, 5 male). As in experiment 1, a minimum sample size of 10 per group was determined, based on an ability to detect a difference between two groups in which proportions of 0.8 and 0.2 achieve the performance criterion. All recruitment and participant inclusion criteria were the same as for experiment 1.
Stimuli
All tones were 500 ms harmonic complexes, including 20 ms raised-cosine rise/fall ramps, separated by silent gaps of 500 ms. All harmonics up to 10 kHz were generated and individually scaled to produce slopes of 24 dB per octave around the appropriate center frequency (CF), with no flat bandpass region. The components were combined in sine phase. Although the harmonics in experiment 1 were combined in random phase, we prioritized maintaining stimulus parameters consistent with Allen and Oxenham (2014) to allow for comparison of adult thresholds. This phase difference across experiments is not expected to have any effects on performance because phase effects are known to affect F0 discrimination and the timbre of unresolved harmonics but not the mostly spectrally resolved harmonics used here (Plomp and Steeneken 1969; Houtsma and Smurzynski 1990).
The CF of each complex determined its brightness. Six values of ∆CF, from 0 to 15 % of the CF, were used to test spectral-envelope discrimination. For each CF and CF ± ∆CF value, tones were generated with F0s of 170, 180, 190, 200, 210, and 220 Hz. The standard CF was varied between 1000 and 1100 Hz between blocks of trials to prevent repeated exposure to background tones with the same CF. Half the participants heard CF increases while the other half heard CF decreases, randomly determined.
Procedure
Stimulus presentation, calibration, and procedures were the same as for experiment 1. Only three ∆CFs were tested for each infant, to reduce the testing time per infant. All adults and all but one infant completed testing in a single visit of approximately 1 h. Both infants and adults were conditioned and trained to categorize two sets of complexes that differed by 15 % in their CF. Infants were then randomly assigned to one of two groups, with group 1 tested on 10 %, 5 %, and 0 % ΔCFs and group 2 tested on 2 %, 0.5 %, and 0 % ΔCFs. Adults were tested with all ΔCF values.
Results
Infants at both ages were able to discriminate smaller changes in spectrum than the musically untrained adults (Fig. 4A). At the ∆CF of 0.5 %, only 1 of 10 adult listeners (10 %, binomial exact 95 % CI [3 %, 45 %]) reached the pass criteria, compared with 9 of 10 3-month-olds (90 %, 95 % CI [56 %, 100 %]) and all 10 7-month-olds (100 %, 95 % CI [69 %, 100 %]) in group 2. The proportion of successful adults was significantly different from the proportion of successful 3-month-olds, χ2(1, N = 20) = 12.80, P < 0.0001, , and successful 7-month-olds, χ2(1, N = 20) = 16.36, P < 0.0001, . Resampling the infants and adults passing and failing the criterion at the 0.5 % ∆CF using bootstrapping with 1000 replications again showed that the CI for adults (95 % percentile bootstrap CI [0 %, 33 %]) did not overlap with that for the 3-month-olds (95 % percentile bootstrap CI [67 %, 100 %]) or the 7-month-olds (95 % percentile bootstrap CI [100 %, 100 %]). As expected, none of the participants achieved criterion when the ∆CF was 0.
Fig. 4.
A Proportion of participants that reached the pass criterion as a function of CF change in experiment 2. Different colors and symbols represent the three different groups, as indicated in the legend. B Mean d′ (± 1 SEM) values at the 0.50 % ΔCF. Number of participants in each group is shown in parentheses by each symbol.
In a secondary analysis to check for any between-group differences in sensitivity, we assessed mean group d′ values at each ∆CF change. Of the participants who progressed to the 0.5 % ∆CF, average 3-month-old and 7-month-old d′ values for the last 10 trials (5 change and no-change trials) were greater than 1 (3 months: N = 9, mean d′ = 1.77, SEM = 0.09; 7 months: N = 10, mean d′ = 1.86, SEM = 0.16). However, the average adult d′ value was less than 1 (N = 4, mean d′ = 0.55, SEM = 0.91) and was not significantly greater than zero (one-sample t-test: t(3) = 0.60, P = 0.59, 95 % CI [− 2.36, 3.46], Cohen’s d = 0.50), consistent with chance-level performance. Note that the data from the one adult participant who reached the pass criteria at the 0.5 % ∆CF with a d′ of 2.49 were included in this average. The average d′ values for this lowest ∆CF are shown in Fig. 4B; average d′ values for all ∆CFs are shown in Fig. 5. As for experiment 1, no systematic differences in sensitivity were observed between groups for those participants achieving the criterion level of performance.
Fig. 5.

Mean d′ as a function of CF change for participants that reached the pass criterion on each ∆CF. The number of participants that contributed to each average is shown by the color-coded numbers. Adult participants who failed to reach criterion are shown in black.
These results rule out the possibility that infants’ superior F0 discrimination found in experiment 1 was due to an inability to perceive spectral changes. Indeed, as with F0, infants’ ability to discriminate small changes in spectral envelope exceeded that of musically untrained adults.
DISCUSSION
A total of 66 infants were tested on F0 and spectral centroid discrimination in the presence of random variation in the other dimension. In experiment 1, 85 % of the infants tested and 73 % of adults with musical training were able to reach the performance criterion at the smallest F0 difference (0.25 %) in the test phase, whereas none of the adult participants without musical training reached this criterion. In experiment 2, 95 % of the infants were able to reach the performance criterion at the smallest CF difference (0.5 %), compared to only 10 % of the adult participants without musical training. These differences were highly significant in both experiments, with p < 0.0001 and an effect size, for each of the comparisons.
In terms of our primary research question, to investigate the early development of complex pitch perception, the results showed that infants at both 3 and 7 months performed as well as adults with musical training. This finding is in line with previous studies showing that infants can discriminate F0 and spectral differences in the first few months of life (Clarkson et al. 1988; Clarkson 1996; Montgomery and Clarkson 1997; Plantinga and Trainor 2009; He and Trainor 2009; Lau and Werner 2012). The surprising aspect of this finding is that no age-related improvements in discrimination were observed, as might be expected given the long developmental trajectory of auditory cortex. The fact that 3-month-olds performed as well as 7-month-olds and musically trained adults suggests that F0 and spectral-envelope discrimination is not dependent on a mature thalamocortical pathway or cortex.
The most unexpected finding — one that motivated experiment 2 — is that more infants reached criterion at the 0.25 % ∆F0 than adults without musical training. Adults often have difficulty distinguishing between F0 and the aspect of timbre manipulated in this experiment known as brightness, describing increases in both as being “higher.” Indeed melodic contours can be recognized when they are conveyed by changes in spectral-envelope peak or CF, as well as by changes in F0 (McDermott et al. 2008). In studies of discrimination, interference between the two dimensions is common (Moore and Glasberg 1990; Borchert et al. 2011; Allen and Oxenham 2014; Lau and Werner 2014), and in functional imaging studies, substantial overlap between representations of pitch and brightness has been reported (Allen et al. 2017, 2019). Musically trained listeners typically demonstrate better F0 discrimination than non-musicians, at least prior to explicit training in the task (Micheyl et al. 2006; Madsen et al. 2019), but even musicians exhibit confusion when discriminating between F0 and spectral peak (Allen and Oxenham 2014). This perceptual interference may be an efficient coding strategy, as the two dimensions often covary in natural sounds (Whalen and Levitt 1995; Kitahara et al. 2005). Indeed recent studies have demonstrated that relatively brief exposure to covariations in sound features can lead to rapid perceptual learning in adults (Stilp et al. 2010; Stilp and Kluender 2012). Such experiments have shown that adult listeners readily exploit statistical regularity in stimulus attributes to improve task performance. One interpretation of our unexpected results is therefore that infants may not yet be able to exploit the expected statistical covariations between stimulus attributes, and that this inability actually enhances their perceptual performance, relative to musically untrained adults, in situations where expectations of covariation are violated.
Another related possible explanation is that adults are more susceptible to a confusion effect induced from the interfering irrelevant dimension. Allen and Oxenham (2014) showed that adult discrimination thresholds were better when the irrelevant variations in F0 or spectral peak were congruent with the change in the target dimension as opposed to when they were incongruent. In line with this observation, we would predict that if infants do not take into account the natural statistics of covariation, then they should not show the benefits of congruent variations in F0 and spectrum found for adults and should also not be adversely affected by incongruent variations in the two dimensions. This prediction remains to be tested.
It may also be that infants’ and adults’ processing strategies differ, even when they are faced with the same task. Specifically with F0, it has been shown that different species seem to rely on different acoustic cues to extract F0, with non-primates (e.g., ferrets and chinchillas) relying primarily on temporal-envelope cues provided by spectrally unresolved harmonics (Shofner and Chaney 2013; Walker et al. 2019), whereas humans and possibly other primates seem to rely more on the cues provided by spectrally resolved harmonics (Oxenham et al. 2011; Song et al. 2016). It is conceivable that different weighting of such cues could lead to different patterns of interference, although our current experimental design cannot shed light on this question.
An important point to note is that while all aspects of the stimuli, task, and protocol remained the same between infants and adults, adults were explicitly instructed to raise their hand when they heard the sound change while infants were implicitly conditioned to respond. It is possible that this difference between implicit versus explicit task performance is in part responsible for the difference in behavioral performance between the infants and the musically untrained adults. Nevertheless, musically trained adults (who were instructed in the same way as the musically untrained adults) did perform at levels similar to the infants.
There are several limitations due to the design of the study that limit our comparisons of F0 and CF discrimination of infants and adults beyond group level probabilities. As we designed the study to detect whether participants in a given age group can or cannot discriminate a given change in F0 or spectral centroid, it is possible that there are more subtle differences in how well 3- and 7-month-olds can discriminate pitch. However, our study was not sufficiently powered to detect such subtle differences. The use of a pass criterion of at least an 80 % hit rate with a false-alarm rate of at most 20 % on the last 5 change trials and the last 5 no-change trials to obtain a broad measure of whether or not a listener can perceive a given change also allows for some difference in d′ to exist between listeners who reach criterion (i.e., there are participants who reach criterion with a d′ of greater than 1.68). Moreover, listeners did not hear the same fixed number of trials, with poorer performers given more opportunities (i.e., more trials) to meet the criterion than good performers, who met the criterion after the minimum number of trials.
In terms of stimuli, the filter slope was 12 dB per octave in experiment 1 versus 24 dB per octave in experiment 2; the effect of this difference on infant F0 and CF discrimination is unknown. In experiment 2, we ruled out the possibility that infants are unable to perceive changes in CF; however, it is also possible that infants perceive changes in spectral centroid but perceptually weight CF changes less than F0 changes (e.g., prioritize pitch over timbre) in comparison to adult listeners without musical training. Finally, in experiment 1, we recruited two groups of adults — those with musical training and those without. It is possible that innate differences in auditory or cognitive difference exist between these groups that go beyond the presence or absence of musical training (Corrigall et al. 2013; McKay 2021).
In studies of speech perception, differential sensitivity to native and non-native phonetic contrasts has been shown for adults but not young infants (Werker and Tees 1984). Although there are non-native phonemes that infants can discriminate but adults cannot, adults generally outperform infants on native language speech discrimination (Kuhl et al. 2006). In other music-related studies that tested both infants and adults on comparable tasks such as musical structure and rhythm perception, absolute performance was also generally poorer for infants (Trainor and Trehub 1992; Hannon and Trehub 2005). Remarkably, our results suggest the reverse developmental trajectory — that F0 discrimination in the presence of spectral variations, and vice versa, becomes more difficult for adult listeners without formal musical training. This conclusion does not necessarily imply a continuous degradation in F0 discrimination from infancy. Instead, it may be that statistical covariations between F0 and spectrum are learned by 12 months (as are speech and musical regularities, see Werker and Tees 1984; Hannon and Trehub 2005). Thus, perceptual interference between F0 and spectrum may emerge at a relatively early age, degrading performance in older infants (12 + months) on tasks, such as ours, that violate these learned expectations. This prediction remains to be tested. Nevertheless, it is clear from our study that accurate F0 and spectral-peak discrimination can be achieved by 3- and 7-month infants, implying that accurate discrimination of these important auditory dimensions is not dependent on complete auditory cortical maturation, and thus suggesting a possible subcortical substrate for both F0 and spectral coding or that an immature auditory cortex is sufficient for F0 and spectral-peak discrimination.
Funding
This work was supported by NIH grants R01 DC00396 and P30 DC04661 to L.A.W.
Declarations
Conflict of Interest
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Allen EJ, Burton PC, Mesik J, et al. Cortical correlates of attention to auditory features. J Neurosci. 2019;39:3292–3300. doi: 10.1523/JNEUROSCI.0588-18.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Allen EJ, Burton PC, Olman CA, Oxenham AJ. Representations of pitch and timbre variation in human auditory cortex. J Neurosci. 2017;37:1284–1293. doi: 10.1523/JNEUROSCI.2336-16.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Allen EJ, Oxenham AJ. Symmetric interactions and interference between pitch and timbre. J Acoust Soc Am. 2014;135:1371–1379. doi: 10.1121/1.4863269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- ANSI . American national standard acoustical terminology. New York: American National Standards Institute; 2013. [Google Scholar]
- Benasich AA, Choudhury NA, Realpe-Bonilla T, Roesler CP. Plasticity in developing brain: active auditory exposure impacts prelinguistic acoustic mapping. J Neurosci. 2014;34:13349–13363. doi: 10.1523/JNEUROSCI.0972-14.2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bendor D, Wang X. The neuronal representation of pitch in primate auditory cortex. Nature. 2005;436:1161–1165. doi: 10.1038/nature03867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Birnholz JC, Benacerraf BR. The development of human fetal hearing. Science. 1983;222:516–519. doi: 10.1126/science.6623091. [DOI] [PubMed] [Google Scholar]
- Borchert EMO, Micheyl C, Oxenham AJ. Perceptual grouping affects pitch judgments across time and frequency. J Exp Psychol Hum Percept Perform. 2011;37:257–269. doi: 10.1037/a0020670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buss E, Taylor CN, Leibold LJ. Factors affecting sensitivity to frequency change in school-age children and adults. J Speech Lang Hear Res. 2014;57:1972–1982. doi: 10.1044/2014_JSLHR-H-13-0254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clarkson MG. Infants’ intensity discrimination: spectral profiles. Infant Behav Dev. 1996;19:181–190. doi: 10.1016/S0163-6383(96)90017-X. [DOI] [Google Scholar]
- Clarkson MG, Clifton RK, Perris EE. Infant timbre perception: discrimination of spectral envelopes. Percept Psychophys. 1988;43:15–20. doi: 10.3758/BF03208968. [DOI] [PubMed] [Google Scholar]
- Corrigall KA, Schellenberg EG, Misura NM (2013) Music training, cognition, and personality. Front Psychol 4. 10.3389/fpsyg.2013.00222 [DOI] [PMC free article] [PubMed]
- Eggermont JJ, Moore JK. Morphological and functional development of the auditory nervous system. In: Werner L, Fay RR, Popper AN, editors. Human Auditory Development. New York: Springer; 2012. pp. 61–105. [Google Scholar]
- Green DM, Swets JA. Signal detection theory and psychophysics. New York: Wiley; 1966. [Google Scholar]
- Hannon EE, Trehub SE. Metrical categories in infancy and adulthood. Psychol Sci. 2005;16:48–55. doi: 10.1111/j.0956-7976.2005.00779.x. [DOI] [PubMed] [Google Scholar]
- He C, Trainor LJ. Finding the pitch of the missing fundamental in infants. J Neurosci. 2009;29:7718–8822. doi: 10.1523/JNEUROSCI.0157-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horn DL, Won JH, Rubinstein JT, Werner LA. Spectral ripple discrimination in normal hearing infants. Ear Hear. 2017;38:212–222. doi: 10.1097/AUD.0000000000000373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Houtsma AJM, Smurzynski J. Pitch identification and discrimination for complex tones with many harmonics. J Acoust Soc Am. 1990;87:304–310. doi: 10.1121/1.399297. [DOI] [Google Scholar]
- Karzon RG, Nicholas JG. Syllabic pitch perception in 2- to 3-month-old infants. Percept Psychophys. 1989;45:10–14. doi: 10.3758/BF03208026. [DOI] [PubMed] [Google Scholar]
- Kitahara T, Goto M, Okuno HG. Pitch-dependent identification of musical instrument sounds. Appl Intell. 2005;23:267–275. doi: 10.1007/s10489-005-4612-1. [DOI] [Google Scholar]
- Kohlrausch A, Houtsma AJ (1992) Pitch related to spectral edges of broadband signals. Philos Trans R Soc Lond, B, Biol Sci 336:375–381; discussion 381–382. 10.1098/rstb.1992.0071 [DOI] [PubMed]
- Kuhl PK, Stevens E, Hayashi A, et al. Infants show a facilitation effect for native language phonetic perception between 6 and 12 months. Dev Sci. 2006;9:F13–F21. doi: 10.1111/j.1467-7687.2006.00468.x. [DOI] [PubMed] [Google Scholar]
- Lau BK, Lalonde K, Oster M-M, Werner LA. Infant pitch perception: missing fundamental melody discrimination. J Acoust Soc Am. 2017;141:65–72. doi: 10.1121/1.4973412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lau BK, Ruggles DR, Katyal S, et al. Sustained cortical and subcortical measures of auditory and visual plasticity following short-term perceptual learning. PLoS ONE. 2017;12:e0168858. doi: 10.1371/journal.pone.0168858. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lau BK, Werner LA. Perception of missing fundamental pitch by 3- and 4-month-old human infants. J Acoust Soc Am. 2012;132:3874–3882. doi: 10.1121/1.4763991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lau BK, Werner LA. Perception of the pitch of unresolved harmonics by 3- and 7-month-old human infants. J Acoust Soc Am. 2014;136:760–767. doi: 10.1121/1.4887464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madsen SMK, Marschall M, Dau T, Oxenham AJ. Speech perception is similar for musicians and non-musicians across a wide range of conditions. Sci Rep. 2019;9:10404. doi: 10.1038/s41598-019-46728-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mattock K, Molnar M, Polka L, Burnham D. The developmental course of lexical tone perception in the first year of life. Cognition. 2008;106:1367–1381. doi: 10.1016/j.cognition.2007.07.002. [DOI] [PubMed] [Google Scholar]
- Maxon AB, Hochberg I. Development of psychoacoustic behavior: sensitivity and discrimination. Ear Hear. 1982;3:301. doi: 10.1097/00003446-198211000-00003. [DOI] [PubMed] [Google Scholar]
- McDermott JH, Lehr AJ, Oxenham AJ. Is relative pitch specific to pitch? Psychol Sci. 2008;19:1263–1271. doi: 10.1111/j.1467-9280.2008.02235.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKay CM. No evidence that music training benefits speech perception in hearing-impaired listeners: a systematic review. Trends in Hearing. 2021;25:2331216520985678. doi: 10.1177/2331216520985678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Micheyl C, Delhommeau K, Perrot X, Oxenham AJ. Influence of musical and psychoacoustical training on pitch discrimination. Hear Res. 2006;219:36–47. doi: 10.1016/j.heares.2006.05.004. [DOI] [PubMed] [Google Scholar]
- Micheyl C, Oxenham AJ. Sequential F0 comparisons between resolved and unresolved harmonics: no evidence for translation noise between two pitch mechanisms. J Acoust Soc Am. 2004;116:3038–3050. doi: 10.1121/1.1806825. [DOI] [PubMed] [Google Scholar]
- Micheyl C, Ryan CM, Oxenham AJ. Further evidence that fundamental-frequency difference limens measure pitch discrimination. J Acoust Soc Am. 2012;131:3989–4001. doi: 10.1121/1.3699253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Montgomery CR, Clarkson MG. Infants’ pitch perception: masking by low- and high-frequency noises. J Acoust Soc Am. 1997;102:3665–3672. doi: 10.1121/1.420153. [DOI] [PubMed] [Google Scholar]
- Moore BCJ, Glasberg BR. Frequency discrimination of complex tones with overlapping and non-overlapping harmonics. J Acoust Soc Am. 1990;87:2163–2177. doi: 10.1121/1.399184. [DOI] [PubMed] [Google Scholar]
- Moore GA, Moore BCJ. Perception of the low pitch of frequency-shifted complexes. J Acoust Soc Am. 2003;113:977–985. doi: 10.1121/1.1536631. [DOI] [PubMed] [Google Scholar]
- Moore JK, Guan YL. Cytoarchitectural and axonal maturation in human auditory cortex. J Assoc Res Otolaryngol. 2001;2:297–311. doi: 10.1007/s101620010052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Norman-Haignere S, Kanwisher N, McDermott JH. Cortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex. J Neurosci. 2013;33:19451–19469. doi: 10.1523/JNEUROSCI.2880-13.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olsho LW. Infant frequency discrimination. Infant Behav Dev. 1984;7:27–35. doi: 10.1016/S0163-6383(84)80020-X. [DOI] [Google Scholar]
- Oxenham AJ, Micheyl C, Keebler MV, et al. Pitch perception beyond the traditional existence region of pitch. PNAS. 2011;108:7629–7634. doi: 10.1073/pnas.1015291108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penagos H, Melcher JR, Oxenham AJ. A neural representation of pitch salience in nonprimary human auditory cortex revealed with functional magnetic resonance imaging. J Neurosci. 2004;24:6810–6815. doi: 10.1523/JNEUROSCI.0383-04.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plantinga J, Trainor LJ (2009) Melody recognition by two-month-old infants. J Acoust Soc Am 125:EL58–EL62. 10.1121/1.3049583 [DOI] [PubMed]
- Plomp R, Steeneken HJM. Effect of phase on the timbre of complex tones. J Acoust Soc Am. 1969;46:409–421. doi: 10.1121/1.1911705. [DOI] [PubMed] [Google Scholar]
- Rosner B. Fundamentals of biostatistics. 4. Belmont, CA: Duxbury Press; 1995. [Google Scholar]
- Shofner WP, Chaney M. Processing pitch in a nonhuman mammal (Chinchilla laniger) J Comp Psychol. 2013;127:142–153. doi: 10.1037/a0029734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Song X, Osmanski MS, Guo Y, Wang X. Complex pitch perception mechanisms are shared by humans and a New World monkey. Proc Natl Acad Sci. 2016;113:781–786. doi: 10.1073/pnas.1516120113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stilp CE, Kluender KR. Efficient coding and statistically optimal weighting of covariance among acoustic attributes in novel sounds. PLoS ONE. 2012;7:e30845. doi: 10.1371/journal.pone.0030845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stilp CE, Rogers TT, Kluender KR. Rapid efficient coding of correlated complex acoustic properties. Proc Natl Acad Sci USA. 2010;107:21914–21919. doi: 10.1073/pnas.1009020107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teller DY. The forced-choice preferential looking procedure: a psychophysical technique for use with human infants. Infant Behav Dev. 1979;2:135–153. doi: 10.1016/S0163-6383(79)80016-8. [DOI] [Google Scholar]
- Trainor LJ, Trehub SE. A comparison of infants’ and adults’ sensitivity to Western musical structure. J Exp Psychol Hum Percept Perform. 1992;18:394. doi: 10.1037/0096-1523.18.2.394. [DOI] [PubMed] [Google Scholar]
- Trehub SE, Thorpe LA, Morrongiello BA. Infants’ perception of melodies: changes in a single tone. Infant Behav Dev. 1985;8:213–223. doi: 10.1016/S0163-6383(85)80007-2. [DOI] [Google Scholar]
- Walker KM, Gonzalez R, Kang JZ, et al (2019) Across-species differences in pitch perception are consistent with differences in cochlear filtering. Elife 8. 10.7554/eLife.41626 [DOI] [PMC free article] [PubMed]
- Werker JF, Tees RC. Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behav Dev. 1984;7:49–63. doi: 10.1016/S0163-6383(84)80022-3. [DOI] [Google Scholar]
- Werner LA (1995) Observer-based approaches to human infant psychoacoustics. In: Klump GM, Dooling RJ, Fay RR, Stebbins WC (eds) Methods in Comparative Psychoacoustics. Birkhäuser Basel, pp 135–146.
- Whalen DH, Levitt AG. The universality of intrinsic F0 of vowels. J Phon. 1995;23:349–366. doi: 10.1016/S0095-4470(95)80165-0. [DOI] [Google Scholar]
- Zatorre RJ, Evans AC, Meyer E, Gjedde A. Lateralization of phonetic and pitch discrimination in speech processing. Science. 1992;256:846–849. doi: 10.1126/science.256.5058.846. [DOI] [PubMed] [Google Scholar]



