Skip to main content
Proceedings of the Royal Society B: Biological Sciences logoLink to Proceedings of the Royal Society B: Biological Sciences
. 2020 Mar 25;287(1923):20193010. doi: 10.1098/rspb.2019.3010

Listening to birdsong reveals basic features of rate perception and aesthetic judgements

Tina Roeske 1,, Pauline Larrouy-Maestri 1, Yasuhiro Sakamoto 2, David Poeppel 1,3
PMCID: PMC7126030  PMID: 32208834

Abstract

The timing of acoustic events is central to human speech and music. Tempo tends to be slower in aesthetic contexts: rates in poetic speech and music are slower than non-poetic, running speech. We tested whether a general aesthetic preference for slower rates can account for this, using birdsong as a stimulus: it structurally resembles human sequences but is unbiased by their production or processing constraints. When listeners selected the birdsong playback tempo that was most pleasing, they showed no bias towards any range of note rates. However, upon hearing a novel stimulus, listeners rapidly formed a robust, implicit memory of its temporal properties, and developed a stimulus-specific preference for the memorized tempo. Interestingly, tempo perception in birdsong stimuli was strongly determined by individual, internal preferences for rates of 1–2 Hz. This suggests that processing complex sound sequences relies on a default time window, while aesthetic appreciation appears flexible, experience-based and not determined by absolute event rates.

Keywords: temporal structure, preference, auditory sequence, tempo, aesthetic judgement

1. Introduction

Humans engage in at least two behaviours involving the production of temporally complex, hierarchical acoustic structure: speaking and making music. Besides serving other communicative purposes, both speech and music are used in aesthetic contexts [1]. Thinkers in Western (e.g. Aristotle, Baumgarten, Kant, von Helmholtz) and other cultures alike (e.g. Confucius in China, or the author of the Indian Natya Shastra) have tried to understand the phenomenon of aesthetic appreciation by asking what qualifies stimuli to be aesthetically liked and how human cognition underpins aesthetic experience.

Aesthetic preferences are highly variable both across and within cultures, between individuals [24], over historical time, and through individual development and repeated exposure [5,6], and depend on a wide variety of factors beyond the stimulus itself [79]. Nevertheless, numerous attempts have been made to identify structural features within stimuli that increase their aesthetic quality, such as a ‘golden section'-based image composition, or Berlyne's influential hypothesis that ‘intermediate complexity’ of a stimulus promotes aesthetic appreciation [1012]. In this study, we adopt this approach to examine the role of acoustic event rate (a stimulus feature) and perceived tempo (a psychological function) in aesthetic appreciation. We are testing the hypothesis that there exists a general aesthetic preference for certain rates or tempo ranges.

There are, indeed, indicators that event rate/tempo may be a target of aesthetic appreciation. Music and speech each feature characteristic temporal modulation spectra, peaking at approximately 2 Hz for music versus 4–5 Hz for speech [13,14]. Interestingly, poetic speech tends to be slower than prose [15,16] (rates of 5.1 versus 5.3 syllables s−1 for poetic versus non-poetic speech). Speech rate also systematically varies with emotional valence, which is often central to aesthetic production [17] and with speaker proficiency [18,19]. Particular rates can thus carry information that is relevant in aesthetic contexts. This might be caused by a general aesthetic preference for some rates over others.

A prerequisite to rate-based aesthetic judgement is the ability of the human auditory system to process a range of acoustic rates. Indeed, auditory processing has developed functional specializations to track acoustic structure on multiple time scales [20,21] and within specific temporal windows [22,23], which may make it highly sensitive to different tempos/event rates. Another perceptual specialization makes us perceive regular event sequences between about 10 Hz and 0.5 Hz as ‘having a tempo’ and extract a commensurate pulse (reviewed by [24]). Such perceptual specializations might translate into differential aesthetic judgements about different rates.

In this study, we explore the relationship between event rates in complex acoustic stimuli, perception of their tempo and their perceived aesthetic quality. We propose three alternative theoretical scenarios (figure 1a,b). In the first two, the aesthetic perception of acoustic sequences is a special case of general perception, with some rates yielding more aesthetic engagement than others.

Figure 1.

Figure 1.

Possible relationship between acoustic rates/perceived tempos and aesthetic appreciation, and their predicted influence on experimental outcome. (a) Acoustic events unfold at an indefinite range of tempos/event rates, the total of which is symbolized by the large dark circle. Some rates are perceivable to human listeners (lighter circle). Perception is biased instead of uniform across the perceivable rates (symbolized by irregular shape in head). (b) Some perceived rates may be inherently more aesthetic than others (1 and 2), either because of an independent aesthetic rate preference (e.g. for intermediate rates for their ease of processing (1)), or based on experience (e.g. natural, typical birdsong rates would be preferred for birdsong (2)). Alternatively, aesthetic appreciation is independent of absolute event rate (3). (c) Predicted effect of the three scenarios on experimental outcome. An aesthetic preference for intermediate rates would result in narrower preferred than presented rates (1), an experience-based preference in overlapping natural-sounding and preferred rates (2) and an aesthetic preference for which absolute rates are irrelevant in similar presented and preferred rates (3). Natural-sounding rates may differ from depicted (no claims are made by preference scenarios about naturalness categorization; hence dashed lines). (Online version in colour.)

Two possible sources of such a rate preference could be ease of processing and experience. In scenario 1 (figure 1b, (1)), ease of processing determines preferred rates, resulting in a preference for intermediate over slower rates, at which information arrives too slowly for effortless integration, and over faster rates, at which bits of information are hard to resolve. This scenario is in line with the assumption of ‘intermediate complexity' increasing aesthetic appreciation [11,25,26]. In scenario 2 (figure 1b, (2)), experience determines appreciation. This results in a preference for the most typical rates of a stimulus kind (i.e. natural rates in the case of natural sounds). This scenario is supported by the well-established fact that experience affects aesthetic appreciation [27]. In scenario 3 (figure 1b, (3)), no aesthetic bias across perceivable rates exists: all perceivable rates can also qualify as aesthetic. Note that this includes the possibility that aesthetic perception is entirely neutral with respect to tempo/rate: What makes us perceive something as aesthetically valuable may not be a function at all of rate as a basic parameter.

With these scenarios in mind, we asked human listeners to manipulate the speed of birdsong stimuli and find a tempo they aesthetically prefer, or—for comparison—a tempo they deem natural.

Using birdsong as a stimulus was motivated as follows. First, birdsong is a natural kind, and growing evidence suggests that human observers rate natural kinds more similarly than artefacts (e.g. [28,29]). Second, in contrast with musical or poetic material, birdsong is not shaped by human motor/production and auditory/processing constraints. Third, birdsong has always been a target of human aesthetic appreciation, and its acoustic structure resembles human aesthetic acoustic stimuli (i.e. speech and music) in its complex temporal progression and hierarchical structure. Note rates of birdsong overlap with temporal modulation spectra of human stimuli, but extend into higher ranges not normally encountered in music (see example rates in figure 2b).

Figure 2.

Figure 2.

Bird songs are acoustic sequences with complex temporal structure. (a) Sonogram (top panel) and waveform (bottom panel) of a thrush nightingale's song. Coloured bars indicate grouping of repeated notes into subphrases on two hierarchical levels. (b) Rate density in common middle European songbirds. Rate densities are species specific and faster than modulation spectra of human music which peak around 1.5–6 Hz (167–667 ms intervals) [13]. Birdsong stimuli used in this study were from two species, European and thrush nightingales, whose rates peak around 6–8 Hz (125–167 ms intervals). (Online version in colour.)

The experiment tests the hypotheses behind the three possible scenarios in figure 1. To test for aesthetic preference across a broad range of note rates (about 2–100 Hz/10–500 ms intervals), we presented the birdsong stimuli at randomized tempos (50–200% of original tempo; figure 3b; presented rates shown in figure 4). The presented songs thus had a considerably broader rate range than the originals. To understand what rates our subjects may have attended to while choosing a playback tempo, we asked participants in a second step to extract tempo by fitting a click train to the looped stimulus (figure 3c). We compared this extracted tempo to both the temporal features of the stimulus and listeners' preferred spontaneous tempo recorded in the absence of any stimulus (figure 3a).

Figure 3.

Figure 3.

Experimental procedure. The six experimental blocks (bottom) each start with the listener selecting a preferred tempo for a click train (a), followed by eight trials of (b) selecting a birdsong playback tempo and (c) extracting its tempo by adjusting a click train to the perceived song tempo. At initial presentation—before exploration starts—tempos are pseudo-randomized between eight tempos of 50–200% of original tempo for birdsong stimuli, and 1.1–6.6 Hz (150–1000 ms intervals) for click trains. Participants are asked, in alternating blocks, to pick playback tempos they personally prefer, or playback tempos that sound natural to them. (Online version in colour.)

Figure 4.

Figure 4.

Neither preferred- nor natural-tempo decisions for birdsong stimuli are biased towards a specific rate range. (a) illustration of how note-to-note times are calculated (i.e. as onset-to-onset times). (b) Across listeners and stimuli, preferred note-to-note times (dark blue) were not different from natural-sounding note-to-note times (light blue). (c) Preferred and natural-sounding note rates as calculated in a sliding 500 ms time window (pooled for preferred and natural-sounding stimuli) were similar to first presentation tempo. (Online version in colour.)

Expected experimental outcomes for the three theoretical scenarios are (figure 1c):

  • (i)

    If listeners have a general preference for intermediate rates, they will prefer a narrower rate range than the one they are presented with (figure 1c (1)), namely some sub-range of the rates perceived as ‘having a tempo', i.e. 0.5–10 Hz (=intervals 100–2000 ms).

  • (ii)

    If aesthetic preference is determined by previous experience, i.e. rates of familiar, common, frequently heard stimuli, then preferred rates will resemble natural-sounding rates (figure 1c (2)): listeners will appreciate what they consider natural birdsong.

  • (iii)

    If aesthetic preference is indifferent about absolute rate, presented and preferred rates will be similar. No prediction can be made for natural-sounding rates in this case (figure 1c (3)): they may or may not differ systematically from presented rates, depending on whether listeners hold specific rate expectations for natural kinds.

2. Material and methods

(a). Participants

Of 48 participants who completed the experiment, 7 were discarded for failing our quality criteria (see section ‘Quality checks'), and one for technical reasons, leaving 40 for analysis (29 women), age 20–50 years (M = 30.6, s.d. = 10.3). Body size was 1.59–2.01 m (M = 173.63, s.d. = 9.61), leg length 88–118 cm (M = 100.06, s.d. = 6.01). Participants were not recruited on the basis of having any particular level of musical, auditory/rhythmic or bird-related expertise. We assessed these with a questionnaire after the experiment (electronic supplementary material, S4). Our sample's expertise in both music and birdsong were low-to-medium (music: M = 1.4 of 5 possible points, s.d. = 1.4; birdsong: M = 43 of 99 possible points [see questionnaire], s.d. = 17).

(b). Material

(i). Original material

18 bird songs—9 from thrush nightingales (Luscinia luscinia), 9 from European nightingales (Luscinia megarhynchos)—were selected from recordings of 11 wild birds singing in a natural environment. Five recordings are from the xeno-canto database (www.xeno-canto.org; recording numbers XC178171, XC75464, XC214082, XC29600 and XC83226). Six recordings were kindly provided by Marc Naguib (Wageningen University) and Philipp Sprau (Ludwig Maximilian University Munich).

Songs were pre-processed to normalize sound intensity level using R128Gain (http://r128gain.sourceforge.net; GNU General Public License). In order to minimize sound artefacts during manipulation, recordings were digitally de-noised using Magix Sequoia 13. Song stimuli are provided as electronic supplementary materials.

Using Sound Analysis Pro 2011 [30], GoldWave (v6.18, Goldwave Inc, 2015) and Matlab 2014b (The MathWorks, Inc., Natick, Massachusetts, United States), we segmented the bird songs into notes and extracted mean note frequency and Wiener entropy.

Through visual sonogram inspection and listening, we identified hierarchical groups (subphrases) of repeated notes and calculated for each stimulus:

  • — note-to-note time intervals;

  • — note rates in a sliding 500 ms window (i.e. note rate density); and

  • — subphrase rates.

At original tempo, song length was 2514–6497 ms (M=4339, s.d. = 1295). Note rates overlapped between the two species (figure 1b; rates were 5.72–18.38 Hz; M = 10.10, s.d. = 3.22), and did not differ significantly (t-test for independent samples, t16 = 1.52, p = 0.148). The species did not differ significantly in the number of notes per song, song duration, note rate, Wiener entropy of notes or mean frequency of notes. Only a higher order organizational principle differed significantly between species, the grouping of similar notes into subphrases (figure 1a): in thrush nightingales, subphrases were longer (t16 = 5.47, p = <0.001) but their number per song was lower (t16 = 7.82, p < 0.001) than in European nightingales (approx. 6 subphrases of 905 ms versus approx. 12 subphrases of 340 ms per song).

(ii). Presentation and manipulation

Stimuli were presented and subjects' choices logged using the MAX 7 software by Cycling ‘74 for Windows version 7.2.4. As interface for the input, we used a Griffin PowerMate USB multimedia controller (rotary dial device) and the PowerMate manager (v2.0.1) driver for Windows. Sampling frequency for data collection was 20 Hz and dial position was recorded continuously every 50 ms. The latency of the input device was well below sampling threshold (maximum ± 10 ms).

For song tempo manipulation, we used a software patch (free_elastique) for MAX (www.devinkerr.com). Its elastic pitch algorithm allowed for real-time time-stretching between 20 and 300% of the original stimulus duration without affecting pitch while avoiding sonic effects/artefacts as much as possible. Three of the co-authors and a sound engineer independently validated the stimulus set.

Available click train inter-click intervals (ICIs) were 0.5–40 Hz (25–2000 ms intervals). The rate of change during manipulation followed a logarithmic scale (i.e. tempo was doubled at equal distances on the dial).

Stimuli were presented through headphones (DT 770 Pro 80 Ohm, Beyerdynamic) as mono sounds (44 100 Hz, 16-bit). Volume level was kept identical across participants.

(iii). Experimental procedure

In a group testing room, 1–8 participants per session were introduced to the tempo dial and instructions. They were asked to manipulate playback speed of birdsongs to explore different tempos before settling on either their ‘preferred' tempo, or the one that seemed most ‘natural', in a blocked design (figure 2).

Preceding each of the six experimental blocks was a trial in which the subjects selected with the dial a click tempo that felt most agreeable to them, in the absence of birdsong (figure 2a). Initial tempo for this click train was 1.3 Hz (760 ms ICI). This task was followed by eight trials that consisted of (i) selecting a birdsong tempo and (ii) extracting the perceived stimulus tempo (at chosen playback speed) by fitting a click train to the looped stimulus (while stimulus and clicks were playing at the same time; turning the dial changed only click tempo; figure 2b). Initial tempos for both song stimuli and click trains were pseudo-randomized (from eight tempos of 50–200% of original tempo for birdsong, resulting in local rates of approx. 2–100 Hz (figure 3c), and 0.75–20 Hz for clicks). Experimental blocks alternated between ‘preference' and ‘naturalness' tasks; first block was ‘preference' for one half and ‘naturalness' for the other half of the participants. Within a block, the order of stimuli was pseudo-randomized across participants. Sessions took about 1 h 30 min.

To ensure that the tasks were within participants' abilities, participants were asked to report task difficulty (see below for the description) on a scale from 1 (easy) to 5 (difficult) as part of the questionnaire at the end of the session (electronic supplementary material, S4). The two manipulation tasks (i.e. naturalness and preference tasks) seemed similarly difficult (Mnat = 2.80, s.d.nat = 1.11 and Mpref = 2.60, s.d.pref = 1.06) whereas extracting tempo appeared as more difficult, Mclick = 3.68, s.d.click = 0.94 (repeated-measure ANOVA, F2,38 = 15.67, p < 0.001, ηp2 = 0.452; Bonferroni-corrected post-hocs, p < 0.001 for both click naturalness, and click preference), albeit not extremely difficult (they included ratings of 1 and 2 as well).

(iv). Measurements and analyses

All analyses were carried out in Matlab 2014b and SPSS 13, and consisted in extracting variables descriptive of the birdsong manipulation (relative change of original tempo; note and subphrase durations/rates during tempo manipulation). For click train manipulations, ICI were extracted. These variables were examined with a paired-sample t-test, one-sample t-test and Pearson correlations, across participants or stimuli, when appropriate. Trajectories of the dial manipulation were retrieved and visually examined.

(v). Quality checks

Data were submitted to analysis only if participants

  • — spent at least 14 s listening to the stimulus, such that they heard at least two looped renditions (1 participant failed);

  • — proved their ability to use the dial by matching a click train to a second click train using the dial before the first experimental block (3 participants failed); and

  • — explored different stimulus tempos, deviating over 40 degrees (mean across stimuli) from initial dial position (3 participants failed).

3. Results

(a). Aesthetic judgement is not biased towards specific event rates

We first investigated whether listeners (n = 40) selected different rates as preferred versus sounding natural. We found that selected note rates (calculated from onset-onset time intervals; figure 4a) were almost perfectly overlapping between the ‘naturalness' and ‘preference' choices (figure 4b), and did not significantly differ (paired sample t-test, t39 = −1.79, 95% CI [−1.03, 0.06], p = 0.081). As a consequence, preference and naturalness ratings are not distinguished for further analyses.

Surprisingly, the overall range of note rates did not systematically change between initial presentation and final tempo decision (figure 4c; paired sample t-test, t39 = 1.829, p = 0.075). This was despite listeners turning the dial (participants with mean deviations under 40 degrees from initial dial position were excluded; see Material and methods). After exploring different stimulus tempos, they settled on a tempo close to the one they first heard—a compelling and unexpected primacy effect. Note that due to our randomization of initial presentation tempo (50–200% of original tempo), selected birdsong rates (figure 4) were considerably broader than the rates of original bird songs: subjects both judged as natural and preferred a much broader tempo range than covered by the original birdsong stimuli.

These results suggest that aesthetically appreciating an acoustic stimulus (or considering it as natural) is not strongly determined by absolute event rates. This is most consistent with scenario (3) in figure 1b: we find no evidence that subjects appreciated (or considered natural-sounding) a particular sub-range of rates that is narrower than what we presented.

(b). Selected birdsong tempos resemble initial presentation tempo

Birdsong tempo selection did not deviate from the initial presentation tempo in any consistent direction. Beyond suggesting an absence of rate biases, this result is in line with the subjects forming an immediate, implicit auditory memory of what they initially heard, maintaining this memory while exploring different tempos that mask the initial percept, and eventually returning to a tempo similar to it. This is reminiscent of known primacy effects in memory for the first of several repeated stimulus presentations [31,32].

In order to determine how robust this memory is, we analysed a posteriori whether it survives longer masking by different stimuli over several minutes. We re-analysed trials with a ‘repeated stimulus' that had already occurred in a previous trial with a different presentation tempo (12 of 48 trials per subject). Concretely, we examined whether the presentation tempo on first presentation correlated with the selected tempo on the second trial, as this would point to a robust memory for the stimulus at first exposure. Indeed, the second selected tempo was significantly correlated with the first presented tempo of the repeated stimuli (r240 = 0.160, p = 0.013). As a control, we checked for correlation between the second presented tempo with the first selected tempo, and found none (r240 = −0.005, p = 0.937)—as expected, because the participants could not know the second presentation tempo at the time of selecting the first.

The return to initial tempo might, in addition, be related to well-described effects of familiarity on appreciation (e.g. [26,33,34]). Indeed, when participants spent more time per trial close to initial tempo, they eventually selected a tempo close to it, and in turn, when they had a broader tempo exploration range during the trial, they eventually chose tempos farther away (details and statistics: electronic supplementary material, S1).

Together, our findings are in line with the following scenario: the quick formation of a lasting auditory memory of the temporal structure of a novel stimulus determines a stimulus-specific aesthetic tempo preference. Familiarity with the tempo enhances preference strength.

(c). Extracted perceived tempo reveals a strong bias towards slow tempos between 1.1 and 2 Hz

The perceived tempos that were extracted from the birdsong stimuli by fitting a click train showed a strong bias towards rates around 1.1–2 Hz (= intervals of 500–900 ms; figure 5). Extracted tempos were thus strikingly different from note rates in the birdsong (figure 5a, histogram). This suggests that the cognitive operation of forming a tempo percept of birdsong does not directly reflect absolute note rates. This is similar to tempo extraction from music: musical tempo is perceived on the level of the beat, which can be considerably slower than note rate.

Figure 5.

Figure 5.

Tempos extracted from birdsong reflect subjects' independently preferred tempo, but do not consistently relate to stimulus rates. (a) Left panel: blue markers represent median note-to-note intervals of each bird song at selected against presented tempo (‘natural-sounding' dark blue, ‘preferred' light blue). Dark and light red markers represent tempo extracted from birdsong stimuli (as ICIs [ms], jitter added to the presented ICIs). Data concentration in red vertical ‘lines' results from the eight start tempos. Birdsong tempo choice strongly correlates with presentation tempo. Extracted (click) tempo is strongly biased towards ICIs around 500–900 ms (1.1–2 Hz), independent of presentation tempo. Right panel: histograms of extracted tempo ICIs and mean inter-note intervals show that listeners extracted tempos slower than note rates. (b) decision trajectories in terms of median intervals between notes (for birdsong) and ICIs (for extracted tempos) from trial start to end (in % of trial duration). Shaded areas: 25th–75th percentiles, thick lines: median. Within the first 30% of a tempo extraction trial, subjects moved ICIs towards the target range, while decision trajectories for song tempos do not systematically change over trial time. (c) top: schematic illustrating of three measures of birdsong temporal structure: inter-note times (red), inter-subphrase times (pink), slowest inter-note times (purple). Bottom: None of these measures is strongly correlated with extracted tempo. (d) independently preferred click tempo in the absence of birdsong (mean per subject, x-axis), and extracted tempo (averaged per subject, y-axis) are strongly correlated (r40 = 0.639; p < 0.001, linear fit in grey). (e) histograms of extracted tempos by subject, sorted by median ICI (red markers). (Online version in colour.)

Moreover, tempos extracted from birdsong were independent of initial click tempo (figure 5a; Pearson correlation r40 = 0.197, p = 0.223, paired sample t-test, t39 = 7.12, p < 0.001). The cognitive operations involved in ‘extracting a tempo' were thus unaffected by the heard clicks, quite in contrast with the operations involved in ‘picking a playback tempo’, which were strongly biased by presented tempo.

Finally, tempo decisions for fitted click trains were preceded by characteristic exploration/decision trajectories that differed from those of birdsong tempo choices (figure 5b): within the first 30% of a click train trial, subjects adjusted ICIs towards their eventual target range, and used the remaining time only for fine-tuning. Such a trajectory likely reflects confidence in the decision, i.e. a strong bias towards the chosen range, despite the task being relatively more difficult than manipulation of the birdsong (see Methods). Birdsong exploration trajectories, by contrast, showed no consistent drifts over trial time. They further corroborate that no strong preference for any sub-range of presented rates drove birdsong tempo choices.

(d). Tempo extracted from birdsong reflects independent individual tempo preference better than stimulus temporal structure

To identify which temporal properties of birdsong drove the listeners' tempo percept, we first looked for correlations of extracted tempo ICIs with median inter-note interval, and with slowest inter-note interval (a salient stimulus property). However, we did not find any evidence for either of these relationships (p > 0.05; figure 5c). We also examined the relation between extracted tempo ICIs and median subphrase duration (another salient stimulus feature; figure 5c). While participants seemed to not systematically match subphrases (see large pink cloud-like data cluster), we found a (weak) relation between subphrase duration and extracted click tempo when examined at the stimulus level (the slower the subphrase rate, the slower the click train).

However, a different variable seemed to predict extracted tempo even better: independent tempo choice for clicks in the absence of birdsong (figure 3a). This independently preferred tempo strongly correlated with stimulus-extracted ICI (r40 = 0.639, p < 0.001; figure 5d, see also electronic supplementary material, figure S2-1). Also, independently preferred click tempo and extracted tempos covered the same tempo range (no statistical difference in paired sample t-test, t39 = 0.11, p = 0.913).

Which of these two factors—subphrase duration (a temporal property of the stimulus) versus independent tempo choice (an individual characteristic)—is a better predictor for the extracted tempos? To determine this, we conducted a multiple linear regression analysis (forced entry procedure) modelling the relationship between extracted tempos and the two potential explanatory variables, median subphrase duration and independently preferred tempo. We identified a significant regression equation (F2,37 = 12.818, p < 0.001, R2 = 409), but only independently preferred tempo was a significant predictor of extracted tempo: extracted ICIs increased by 0.693 ms for each ms of independently preferred tempo.

Even though participants’ extracted tempos were correlated with independently preferred tempo, the task they were asked to solve was selecting the tempo perceived in the stimulus, and they spent time to calibrate click tempo to their stimulus tempo percept. Participants might therefore have tried to match (within their independently preferred tempo range) some salient temporal properties of the song stimulus (other than the median subphrase duration/note length/slowest note duration that we had already analysed). We explored two more possibilities, time-locking of extracted tempo clicks to (i) note onsets and (ii) subphrase onsets (electronic supplementary material, figure S3-1). However, we observed no consistent time-locking of extracted tempo clicks to any onsets in the stimuli.

The extracted tempo choices thus appeared to rely on an individual, independently preferred tempo rather than on actual temporal properties of the stimulus, of which we tested a broad range.

Note that click tempo did not correlate with physical or personal traits of our subjects, including body size, leg length, expertise with music or birdsong or with perceived task difficulty.

4. Discussion

(a). No aesthetic preference for absolute rates

Our participants' aesthetic judgement was not biased towards any particular rates: listeners appreciated all presented tempos, including not only rates exceeding typical human rates, but even ranging from half to double the original birdsong tempos. Our data thus suggest that the slower rates of music and poetic speech are not accounted for by a general preference for those rates. Instead, our results support scenario (3) in figure 1b: aesthetically preferred rates in a complex acoustic stimulus like birdsong were neither biased towards intermediate rates (scenario (1)), nor towards a narrowly defined range of what listeners expect as typical birdsong rates (scenario (2)), nor were they limited to typical human rates. Instead of generally preferring particular absolute rates, our participants developed highly stimulus-specific rate preferences based on exposure. Our results align well with recent findings by Van Hedger et al. [35] that aesthetic preferences for natural sounds (including birdsong) do not seem to be inherently tied to various acoustic features (e.g. centroid, entropy).

(b). Stimulus-specific rate preferences formed rapidly upon exposure

We observed a correlation between tempo choice of repeatedly presented bird songs and presentation tempo at first exposure: when a stimulus was presented repeatedly throughout the experiment, the selected tempo at the second rendition resembled the presented tempo at the first rendition (but not the other way around). This finding points to a prominent role of memory in the tempo decision: listeners seem to form an auditory memory at initial exposure, recognize the same stimulus (presented at a different tempo) after several other trials and try to match the memorized temporal pattern. Rapid auditory memory formation has been described previously (e.g. memories of random noise patterns can be induced through simple reoccurrence in an attended noise stream [36,37]). Moreover, temporally exact memories have been described for musical stimuli [3840]. Our results corroborate that (i) such memories for complex acoustic stimuli are implicitly formed without any task requiring them, (ii) they are remarkably exact with respect to temporal structure, (iii) they are robust against subsequent masking and (iv) they determine future temporal preferences/expectations in a stimulus-specific way (they are not generalized across stimuli of the same natural kind).

The data are reminiscent of two well-described psychological phenomena: primacy effects on memory and familiarity effects on stimulus appreciation. Both could be related to the tempo decisions we observed. Primacy effects on memory describe that when subjects are presented with different variants of a stimulus in succession, they tend to memorize the first variant best [31,32]. In our case, subjects may have memorized the initial tempo at first presentation better than tempos heard later. This would have rendered the initial tempo subjectively most familiar to them. Familiarity, in turn, affects appreciation, as repeatedly shown for musical (and other) stimuli (see [25,26,34] for an overview). Our data are in line with subjects preferring the most familiar stimulus version, and with familiarity of initial presentation tempo possibly enhanced by primacy effects on memory. Note, though, that our study design does not allow us to quantify the contribution of primacy: we did not control exposure to different tempos during exploration. Therefore, the first tempo might have been the tempo that the participant listened to the longest.

In sum, our findings are suggestive of a mechanism by which rapid, stimulus-specific memory formation affects liking, an effect strengthened through repeated exposure. We expect this to be a general mechanism not only applying to natural, but any sound sequences including music and speech, and possibly underlying the well-documented initial increase of liking with the familiarity of music. Testing this hypothesis with the proposed methods using musical/speech stimuli, however, is challenging: the familiarity of such stimuli is hard to control, as adults have stored a high number of familiar musical and speech patterns (rhythms, melodies, timbres, rhymes, etc.) that would overlap with the proposed stimuli and affect their appreciation in uncontrollable ways. To determine whether rapid auditory memory-supported appreciation does generalize to music and poetic speech, our method might prove most useful when either using highly unfamiliar music/speech stimuli (e.g. from an unfamiliar culture/language) or tightly controlled artificial stimulus classes. Another promising further direction would be to investigate young children with limited musical/poetry experience, as any appreciation-enhancing effects of rapid auditory memory formation might be the easier to detect the lower the interference from older memories.

(c). Tempo extracted from a complex acoustic stimulus resembles ‘preferred tempo' and temporal windows for chunked processing

We designed the task of extracting the perceived tempo of birdsong in order to reveal what time scales subjects attended to when settling on a stimulus tempo. This seemed important given that our hierarchically organized bird songs cover multiple time scales.

However, our participants settled on click tempos that represented their own internally preferred tempo—rather than a number of manifest temporal stimulus properties we tested. Extracted tempos strongly converged on a range corresponding to a well-described pace known as ‘preferred', ‘internal' or ‘personal tempo' (reviewed in [41]) that biases both perception and production [4244]: during perception of simple sound sequences, adult listeners judge as ‘not too fast and not too slow' time intervals around 600 ms [42,45], and they produce similar intervals during spontaneous rhythmic motor activity. Preferred tempo has been associated with internal clock speed, representing the intrinsic period of an internal oscillator serving entrainment, or a general cognitive tempo (‘pace of mental activity' [41]). Our measure of independently preferred click tempo closely mirrors previous measures of preferred perceptual tempo [45,46]. Similar to those, our individuals differed in their preferred tempo [45] but were remarkably consistent intra-individually [47].

Do participants actually hear their own, inherent tempo when listening to a quasi-periodic natural stimulus? This would be a possible (and interesting) explanation of our results. However, we cannot tell how strong such a top-down imposition of own tempo would be in real life: perhaps our subjects ‘extracted' slow tempos not because they actually corresponded to a conspicuous tempo percept, but just reverted to their internal ‘default' tempo as a heuristic when faced with the challenging task of extracting a tempo from a not strictly periodic stimulus. While we cannot exclude this possibility, note that our subjects did not report great difficulty with the task (see Material and methods). We therefore assume they confidently solved the task—extracting a tempo—and that their individual internal tempo indeed determined perceived stimulus tempo to a large extent.

Birdsong is a stimulus that is quasi-periodic but has no clear beat. Cautiously assuming that our subjects' slow ‘extracted tempos' are representative of actual tempo percepts, their strong similarity with individually preferred tempo might point to a default temporal processing window for any quasi-periodic acoustic stimuli: a top-down perceptual chunking process might chunk the acoustic stream in blocks of around 500–900 ms. This reminds one of a cortical specialization for processing 500 ms sound chunks of speech [48]. A similar top-down chunking process has also been postulated based on how cortical oscillations track complex acoustic streams that contain information on multiple time scales [49]: Chunking in temporal windows greater than 200 ms best explains tone recognition embedded in (non-natural) modulated sounds and the concurrent oscillatory brain activity.

Interestingly, the window size of our participants' extracted tempos also resembles typical musical beats [13] and corresponds to a particularly salient tempo in music [14]. Our results suggest that this window size might work as an internal tempo ‘prior', even in the case of a non-human and not strictly periodic auditory stimulus: listeners perceived as fitting to birdsong an internal top-down time window of about 500–900 ms—even when it was largely independent of temporal stimulus properties.

In sum, however, the extent to which extracted click tempos reflect an architectural attribute of auditory tempo processing, rather than a heuristic method to solve a difficult auditory task, requires further investigation.

5. Conclusion

Using the tempo of birdsong as a window into auditory aesthetic perception, we found that absolute event rates neither played a role in aesthetic judgements nor did they determine a more general cognitive classification of a stimulus (here as ‘natural birdsong'). At the same time, our data suggest that upon hearing a novel stimulus, listeners rapidly form a robust, implicit memory of its temporal (and probably other structural) properties and develop a stimulus-specific preference for the memorized tempo over other possible tempos. By contrast, the perception of a tempo—which is a central aspect of processing musical and speech stimuli—turned out to be strongly determined by individual, internal preferences for tempos peaking around 1–2 Hz—and in the case of birdsong, largely independent of actual temporal stimulus properties.

Supplementary Material

Supplementary Materials 1-4
rspb20193010supp1.pdf (2.1MB, pdf)
Reviewer comments

Ethics

Experimental procedure was ethically approved by the Ethics Council of the Max Planck Society.

Data accessibility

Stimulus material can be accessed via Edmond (the Open Access Data Repository of the Max Planck Society) under the title ‘Aesthetic rate perception of birdsong': https://edmond.mpdl.mpg.de/imeji/collection/6YlSulV5BEP_Cdj.

Authors' contributions

T.R., P.L.M. and D.P. were involved in conceptualization.T.R., P.L.M. and YS were involved in methodology/analysis. T.R. and P.L.M. were involved in writing of manuscript and figure preparation. T.R., P.L.M., Y.S. and DP were involved in review and editing in final approval.

Competing interests

We declare we have no competing interests.

Funding

We received no funding for this study.

References

  • 1.Turner F, Pöppel E. 1983. The neural lyre: poetic meter, the brain, and time. Poetry 142, 277–309. [Google Scholar]
  • 2.Vessel EA, Starr GG, Rubin N. 2013. Art reaches within: aesthetic experience, the self and the default mode network. Front. Neurosci. 7, 258 ( 10.3389/fnins.2013.00258) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Juslin PN, Sakka LS, Barradas GT, Liljeström S. 2016. No accounting for taste? Idiographic models of aesthetic judgment in music. Psychol. Aesthet. Creat. Arts 10, 157–170. ( 10.1037/aca0000034) [DOI] [Google Scholar]
  • 4.Pugach C, Leder H, Graham DJ. 2017. How stable are human aesthetic preferences across the lifespan? Front. Hum. Neurosci. 11, 289 ( 10.3389/fnhum.2017.00289) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Martindale C, Moore K, West A. 1988. Relationship of preference judgments to typicality, novelty, and mere exposure. Empir. Stud. Arts 6, 79–96. ( 10.2190/MCAJ-0GQT-DJTL-LNQD) [DOI] [Google Scholar]
  • 6.Hargreaves DJ, North AC, Tarrant M. 2015. How and why do musical preferences change in childhood and adolescence? In The child as musician: a handbook of musical development (ed. McPherson GE.), pp. 303–322. Oxford, UK: Oxford University Press. [Google Scholar]
  • 7.Martindale C, Moore K, Borkum J. 1990. Aesthetic preference: anomalous findings for Berlyne's psychobiological theory. Am. J. Psychol. 103, 53–80. ( 10.2307/1423259) [DOI] [Google Scholar]
  • 8.Vessel EA, Rubin N. 2010. Beauty and the beholder: highly individual taste for abstract, but not real-world images. J. Vis. 10, 1–14. ( 10.1167/10.2.18) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Reber R, Schwarz N, Winkielman P. 2004. Processing fluency and aesthetic pleasure: is beauty in the perceiver's processing experience? Personality and Social Psychology Review 8, 364–382. [DOI] [PubMed] [Google Scholar]
  • 10.Munsinger H, Kessen W. 1964. Uncertainty, structure, and preference. Psychol. Monogr. Gen. Appl. 78, 1–24. ( 10.1037/h0093865) [DOI] [Google Scholar]
  • 11.Berlyne D. 1971. Aesthetics and psychobiology. New York, NY: Appleton-Century-Crofts. [Google Scholar]
  • 12.Gombrich E. 1995. The story of art, 16th edn London, UK: Phaidon. [Google Scholar]
  • 13.Ding N, Patel AD, Chen L, Butler H, Luo C, Poeppel D. 2017. Temporal modulations in speech and music. Neurosci. Biobehav. Rev. 81, 181–187. ( 10.1016/j.neubiorev.2017.02.011) [DOI] [PubMed] [Google Scholar]
  • 14.Farbood MM, Marcus G, Poeppel D. 2013. Temporal dynamics and the identification of musical key. J. Exp. Psychol. Hum. Percept. Perform. 39, 911–918. ( 10.1037/a0031087) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Blohm S, Versace S, Methner S, Wagner V.2017. Eye movements and acoustic evidence reveal behavioural differences between poetry and prose reading. Poster presented at the 23rd AMLaP conference, Lancaster, UK.
  • 16.Byers PP. 1979. A formula for poetic intonation. Poetics. 8, 367–380. ( 10.1016/0304-422X(79)90007-X) [DOI] [Google Scholar]
  • 17.Kraxenberger M, Menninghaus W, Roth A, Scharinger M. 2018. Prosody-based sound-emotion associations in poetry. Front. Psychol. 9, 1284 ( 10.3389/fpsyg.2018.01284) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kowal S, O'Connell DC, O'Brien EA, Bryant ET. 1975. Temporal aspects of reading aloud and speaking: three experiments. Am. J. Psychol. 88, 549 ( 10.2307/1421893) [DOI] [Google Scholar]
  • 19.Funkhouser L, O'Connell DC. 1978. Temporal aspects of poetry readings by authors and adults. Bull. Psychon. Soc. 12, 390–392. ( 10.3758/BF03329717) [DOI] [Google Scholar]
  • 20.Doelling KB, Poeppel D. 2015. Cortical entrainment to music and its modulation by expertise. Proc. Natl Acad. Sci. 112, E6233–E6242. ( 10.1073/pnas.1508431112) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Nozaradan S, Peretz I, Missal M, Mouraux A. 2011. Tagging the neuronal entrainment to beat and meter. J. Neurosci. 31, 10 234–10 240. ( 10.1523/JNEUROSCI.0411-11.2011) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Teng X, Tian X, Rowland J, Poeppel D. 2017. Concurrent temporal channels for auditory processing: oscillatory neural entrainment reveals segregation of function at different scales. PLoS Biol. 15, e2000812 ( 10.1371/journal.pbio.2000812) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Teng X, Poeppel D. 2019. Theta and Gamma bands encode acoustic dynamics over wide-ranging timescales. Cereb. Cortex. 25, 3077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.London J. 2002. Cognitive constraints on metric systems: some observations and hypotheses. Music Percept. 19, 529–550. ( 10.1525/mp.2002.19.4.529) [DOI] [Google Scholar]
  • 25.North AC, Hargreaves DJ. 1995. Subjective complexity, familiarity, and liking for popular music. Psychomusicol. A J. Res. Music Cogn. 14, 77–93. ( 10.1037/h0094090) [DOI] [Google Scholar]
  • 26.Chmiel A, Schubert E. 2017. Back to the inverted-U for music preference: a review of the literature. Psychol. Music. 45, 886–909. ( 10.1177/0305735617697507) [DOI] [Google Scholar]
  • 27.Zajonc RB. 1968. Attitudinal effects of mere exposure. J. Pers. Soc. Psychol. 9(2, Pt. 2), 1–27. ( 10.1037/h0025848)5667435 [DOI] [Google Scholar]
  • 28.Vessel EA, Maurer N, Denker AH, Starr GG. 2018. Stronger shared taste for natural aesthetic domains than for artifacts of human culture. Cognition. 179, 121–131. ( 10.1016/j.cognition.2018.06.009) [DOI] [PubMed] [Google Scholar]
  • 29.Leder H, Goller J, Rigotti T, Forster M. 2016. Private and shared taste in art and face appreciation. Front. Hum. Neurosci. 10, 155 ( 10.3389/fnhum.2016.00155) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tchernichovski O, Mitra PP, Lints T, Nottebohm F. 2001. Dynamics of the vocal imitation process: how a zebra finch learns its song. Science 291, 2564–2569. ( 10.1126/science.1058522) [DOI] [PubMed] [Google Scholar]
  • 31.Digirolamo GJ, Hintzman DL. 1997. First impressions are lasting impressions: a primacy effect in memory for repetitions. Psychon. Bull. Rev. 4, 121–124. ( 10.3758/BF03210784) [DOI] [Google Scholar]
  • 32.Miller JK, Westerman DL, Lloyd ME. 2004. Are first impressions lasting impressions? An exploration of the generality of the primacy effect in memory for repetitions. Mem. Cognit. 32, 1305–1315. ( 10.3758/BF03206321) [DOI] [PubMed] [Google Scholar]
  • 33.Hargreaves DJ. 1984. The effects of repetition on liking for music. J. Res. Music Educ. 32, 35–47. ( 10.2307/3345279) [DOI] [Google Scholar]
  • 34.Chmiel A, Schubert E. 2018. Unusualness as a predictor of music preference. Music Sci. 23, 426–441. ( 10.1177/0305735617697507) [DOI] [Google Scholar]
  • 35.Van Hedger SC, Nusbaum HC, Heald SLM, Huang A, Kotabe HP, Berman MG. 2019. The aesthetic preference for nature sounds depends on sound object recognition. Cogn. Sci. 43, e12734 ( 10.1111/cogs.12734) [DOI] [PubMed] [Google Scholar]
  • 36.Agus TR, Thorpe SJ, Pressnitzer D. 2010. Rapid formation of robust auditory memories: insights from noise. Neuron. 66, 610–618. ( 10.1016/j.neuron.2010.04.014) [DOI] [PubMed] [Google Scholar]
  • 37.Luo H, Tian X, Song K, Zhou K, Poeppel D. 2013. Neural response phase tracks how listeners learn new acoustic representations. Curr. Biol. 23, 968–974. ( 10.1016/j.cub.2013.04.031) [DOI] [PubMed] [Google Scholar]
  • 38.Levitin DJ, Cook PR. 1996. Memory for musical tempo: additional evidence that auditory memory is absolute. Percept. Psychophys. 58, 927–935. ( 10.3758/BF03205494) [DOI] [PubMed] [Google Scholar]
  • 39.Gratton I, Brandimonte MA, Bruno N. 2016. Absolute memory for tempo in musicians and non-musicians. Grahn JA, editor PLoS ONE 11, e0163558 ( 10.1371/journal.pone.0163558) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gordon MS. 2016. Absolute tempo perception of popular music. Psychomusicol. Music Mind Brain. 26, 236–246. ( 10.1037/pmu0000154) [DOI] [Google Scholar]
  • 41.McAuley JD. 2010. Tempo and rhythm, pp. 165–199. New York, NY: Springer. [Google Scholar]
  • 42.Fraisse P. 1982. Rhythm and tempo. In The psychology of music: cognition and perception (ed. Deutsch D.), pp. 149–180. New York, NY: Academic Press. [Google Scholar]
  • 43.Moelants D. 2002. Preferred tempo reconsidered. In Proceedings of the 7th International Conference on Music Perception and Cognition, Sydney (eds Stevens C, Burnham D, McPherson G, Schubert E, Renwick J), pp. 580–583. Adelaide, Australia: Causal Productions. [Google Scholar]
  • 44.Stern W. 1900. Das psychische Tempo. In Über psychologie der individuellen differenzen: ideen zu einer ‘differentiellen psychologie. Leipzig, Germany: JA Barth. [Google Scholar]
  • 45.McAuley JD, Jones MR, Holub S, Johnston HM, Miller NS. 2006. The time of our lives: life span development of timing and event tracking. J. Exp. Psychol. Gen. 135, 348–367. ( 10.1037/0096-3445.135.3.348) [DOI] [PubMed] [Google Scholar]
  • 46.Frischeisen-Köhler I. 1933. The personal tempo and its inheritance. Character Personal A Q Psychodiagnostic Allied Stud. 1, 301–313. [Google Scholar]
  • 47.Harrell TW. 1937. Factors influencing preference and memory for auditory rhythm. J. Gen. Psychol. 17, 63–104. ( 10.1080/00221309.1937.9917974) [DOI] [Google Scholar]
  • 48.Overath T, McDermott JH, Zarate JM, Poeppel D. 2015. The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts. Nat. Neurosci. 18, 903–911. ( 10.1038/nn.4021) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Teng X, Tian X, Doelling K, Poeppel D. 2017. Theta band oscillations reflect more than entrainment: behavioral and neural evidence demonstrates an active chunking process. Eur. J. Neurosci. 48, 2770–2782. ( 10.1111/ejn.13742) [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Materials 1-4
rspb20193010supp1.pdf (2.1MB, pdf)
Reviewer comments

Data Availability Statement

Stimulus material can be accessed via Edmond (the Open Access Data Repository of the Max Planck Society) under the title ‘Aesthetic rate perception of birdsong': https://edmond.mpdl.mpg.de/imeji/collection/6YlSulV5BEP_Cdj.


Articles from Proceedings of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society

RESOURCES