Abstract
While a number of theories have been advanced to account for why musical consonance is related to simple frequency ratios, as yet there is no completely satisfying explanation. Here, we explore the theory of synchronization properties of ensembles of coupled neural oscillators to demonstrate why simple frequency ratios may have achieved a special status and why they are important for auditory perception. The analysis shows that the mode-locked states ordering give precisely the standard ordering of consonance as often listed in Western music theory. Our results thus indicate the importance of neural synchrony in musical perception.
Keywords: consonance, dissonance, neural synchronization, mode locking, coupled oscillator, musical interval
1. Introduction
For more than two millennia musicians and theorists have debated those factors that tend to give rise to the perception of musical consonance and dissonance (Helmholtz 1877; Plomp & Levelt 1965; Roederer 1975; Tenney 1988; Hartmann 1998). Although there is no single musical definition, consonance is usually referred to as the pleasant, ‘stable’ sound sensation produced by certain combinations of two tones played simultaneously. By contrast, dissonance is the unpleasant grating sound heard with other sound combinations. The common octave, for example, is judged as consonant, while playing two adjacent keys on the piano together (i.e. a semitone) is perceived as dissonant (see electronic supplementary material). The dominating theory to explain these sensations is attributed to Pythagoras and suggests that the simpler the frequency ratio between two tones, the more consonant they will be perceived; the sonority being reflected in the resulting interval's ‘pleasantness’ (figure 1). Consider two pure tones having frequencies f1=P and f2=Q. According to the Pythagorean view, the consonance of the two tones may be ordered by the simplicity of their relative integer frequency ratio P : Q (Roederer 1975; Tenney 1988). Simple integer ratios, argued Galileo, being ‘commensurable in number, so as not to keep the ear drum in perpetual torment’ (Tenney 1988). Thus, the consonant octave is characterized by a 1 : 2 frequency ratio between two tones, while the dissonant semitone is characterized by a 15 : 16 ratio. In Western culture, the intervals are often listed in the decreasing order of ‘perfection’ shown in table 1.
Table 1.
interval's evaluation | interval's name | interval's ratio | ΔΩ | |
---|---|---|---|---|
absolute consonances | unison | 1 : 1 | 0.075 | consonance |
octave | 1 : 2 | 0.023 | ||
perfect consonances | fifth | 2 : 3 | 0.022 | |
fourth | 3 : 4 | 0.012 | ||
medial consonances | major sixth | 3 : 5 | 0.010 | |
major third | 4 : 5 | 0.010 | ||
imperfect consonances | minor third | 5 : 6 | 0.010 | |
minor sixth | 5 : 8 | 0.007 | ||
dissonances | major second | 8 : 9 | 0.006 | |
major seventh | 8 : 15 | 0.005 | ||
minor seventh | 9 : 16 | 0.003 | ||
minor second | 15 : 16 | — | ||
tritone | 32 : 45 | — | dissonance |
Preference for musical intervals of simple frequency ratios such as the octave, fifth and fourth, might simply reflect education or immersion and exposure to Western musical practices. Cross-cultural examinations of scale structure in music shows that there is a high preponderance of fifths (2 : 3), fourths (3 : 4) and octaves (1 : 2; Schellenberg & Trehub 1994b). Moreover, it is well known that the simplicity of frequency ratios has played a central role in musical theories of intervallic consonance and dissonance (Helmholtz 1877; Tenney 1988). It has thus become a common view that musical consonance is, to a possibly large extent, learnt through exposure to musical culture. The learning process might thus be chiefly responsible for the special status of tones related by simple frequency ratios. By contrast, Schellenberg & Trehub (1994a,b, 1996a,b) attempted to explore the possibility that the special perceptual status of intervals with simple frequency ratios derives from a natural or inherent biological basis. This was achieved by evaluating infants' ability to detect subtle changes to patterns of simultaneous and sequential pure tones. Their results confirmed that simple, as opposed to complex, frequency ratios are more readily identified by listeners and consequently are more likely to result in a stable perceptual representation. As this was true even for infants, the perceptual status of these special intervals is unlikely to be due to education or exposure to Western musical practices.
2. HELMHOLTZ'S THEORY OF BEATING HARMONICS
A scientific basis for the phenomenon of consonance and dissonance was established by Helmholtz (1877) and was based on the number and strength of ‘beating’ harmonics in a pair of simultaneous complex tones (Roederer 1975; Hartmann 1998). Helmholtz argued that for two complex tones in unison (P : Q=1 : 1) or an octave apart (P : Q=1 : 2), all harmonics of the second tone are aligned and coincident in frequency with the first, and thus the outcome is highly consonant. However, as the frequency ratio P : Q becomes more ‘complicated’, the two tones share fewer common harmonics, while there is an increase of harmonic pairs slightly mismatched in frequency. According to Helmholtz's (1877) linear theory, these latter nearby harmonics interact and lead to an unpleasant beating sensation that results in dissonance.
The beating effect may be understood mathematically by considering the linear addition of two pure sine tones (i.e. with no harmonics) having almost the same frequencies ω1 and ω2=ω1+δ, both of the same amplitude. Summing these signals linearly gives
(2.1) |
where the average frequency . Thus, a listener will not have the impression of listening to two different frequencies but instead will hear a single pure tone with a pitch corresponding to the average frequency and with loudness that varies slowly leaving a beating sensation oscillating with an envelope at frequency δ=ω2−ω1. The beating disappears only after surpassing a sufficiently large frequency difference, at least δ>15 Hz (see Roederer 1975, p. 28). All signs of roughness disappear when the frequency difference surpasses ‘the critical bandwidth’, which is approximately 10–20% of the centre frequency for frequencies greater than 500 Hz, and pure tones sound both ‘smooth’ and ‘pleasing’ (Plomp & Levelt 1965; Roederer 1975, p. 28).
3. PROBLEMS WITH HELMHOLTZ'S THEORY
Helmholtz's (1877) theory is scientifically appealing, but yet it remains controversial and fails to explain a number of non-trivial aspects central to musical psychoacoustics.
Plomp & Levelt (1965) have demonstrated that once the frequency difference δ between two pure-tone intervals exceeds 3 semitones (i.e. beyond the critical bandwidth), no roughness can be experienced by the ear. However, beyond this critical bandwidth the evaluation of consonance can vary considerably and change direction (with peaks and valleys) as δ increases. Yet, these changes of consonance occur despite the absence of harmonics, and thus in a regime where beats should be entirely absent. Clearly, Helmholtz's theory of beats is unable to explain these consonance sensations.
When applying sequential pure tones that do not enter the ear simultaneously, Helmholtz's theory would no longer seem applicable. Nevertheless, sequential pure-tone intervals with simple (as opposed to complex) frequency ratios were found to be more ‘readily processed by listeners’ (Schellenberg & Trehub 1996a). Here, ease of processing a tone pattern referred to enhanced discrimination of that pattern in experiments. This suggests a special perceptual status for intervals with simple frequency ratios.
Experimental studies have shown that patients with auditory cortex lesions lack the ability to evaluate consonance in a similar manner to normal patients (Peretz et al. 2001; Tramo et al. 2001). This raises the question as to whether the source of musical perception is governed by peripheral mechanisms in the inner ear as held by Helmholtz. Rather it suggests the existence of specific neural pathways that are devoted to dissonance computation and that can be disrupted selectively by brain damage (Tramo et al. 2001).
The EEG responses of subjects to pairs of pure tones show that neural processing of consonance depends on higher associative processing of pitch relationships in the cerebral cortex (Itoh et al. 2003). That is, consonance is not just the absence of roughness but determined by neural processing in the auditory cortex. Itoh et al. (2003) reached this conclusion by studying the auditory evoked potentials indicative of cortical activity response. Of the intervals studied (1, 4, 6, 7, 9 semitones), they found that in all cases the evoked potentials were at their highest (in terms of voltage) for two pure tones separated by a perfect fifth (7 semitones) when compared with other intervals. These results provide electrophysiological evidence that matches behavioural preference for simple frequency ratios. Given that pure tones only were made use of in the experiments, this preference has nothing to do with the beating of harmonics which forms the basis of Helmholtz's theory (1877).
4. COUPLED OSCILLATOR MODEL OF AUDITORY PERCEPTION
We are thus led to ask, over and above Helmholtz's beating phenomena, why do some combinations of tones sound more pleasant than others? The answer to this question may well have to do with the nonlinear dynamics of auditory perception, in contrast to Helmholtz's solely linear framework. Consider then, two coupled ‘integrate and fire’ neural oscillators that in the absence of coupling have distinct frequencies ω1 and ω2 and a relative frequency ratio Ω=ω1/ω2. Each oscillator might typically represent a neuron, or a population of neurons. Such signals are processed in the auditory cortex within the right superior temporal gyrus that is believed to be involved in the analysis of pitch and timbre (Samson & Zatorre 1994; Zatorre et al. 1994; Blood et al. 1999). In response to a specific auditory tone frequency stimulating the cochlea, such an oscillator would fire at a given frequency. For modelling simplicity, firing frequencies may be the same as the driving frequencies, but in reality may be scaled-down versions of them, since neurons cannot fire at rates much beyond a kilohertz.
A simple scheme of two mutually coupled oscillators that captures the generic behaviour consists of two voltage variables x1, x2 as follows (Coombes & Lord 1997):
(4.1) |
Here τ1, τ2 are decay constants; E1(t) represents the effect of neuron-2 on neuron-1 and vice versa; I1 and I2 represent the external input that x1 and x2 receive, respectively; and ϵ represents the strength of coupling between the neurons.
The first oscillator (x1) increases in voltage and ‘fires’ only when it reaches a fixed threshold (x1=1). After firing, the oscillator is instantaneously reset to zero (x1=0), while the voltage of the second oscillator is instantaneously increased by ϵE2(t), i.e.
The strength of coupling between the oscillators is thus determined by ϵ. One of the simplest coupling schemes assumes that communication between the neurons is via a sharp infinitesimal pulse, such as the Dirac δ-function (Mirollo & Strogatz 1990),
where denotes the jth firing time of oscillator-1. The firing of neuron x1 thus results in an increase by an amount ϵ in the voltage of oscillator-2.
The simple Dirac δ-function pulse is only a first approximation. In reality, the effective input to the neuron has a longer temporal duration due to the synaptic transmission process. One particular pulse shape that approximates the rise and fall time of real synaptic currents in a realistic fashion is of the following form (Jack et al. 1975):
(4.2) |
Here α(t) represents the exponential rise (and fall) of the synapse of x as shown in figure 2, and Θ(t) is a step function such that
The maximal synaptic response occurs at a time α−1 after the arrival of an action potential (Coombes & Lord 1997). In practice, the final input to the neuron is a sum of distributed delays represented by alpha functions, which gives (Coombes & Lord 1997)
(4.3) |
The above formula makes allowance for the fact that the voltage of the oscillator is increased by an amount calculated over the weighted sum of all past firings of its neighbouring coupled oscillator. (For α→∞, the simple case of coupling via a δ-function is retrieved.)
The frequency ωi of oscillator-i when uncoupled is found by solving the differential equation
to obtain , given the initial conditions x=0 at t=0. Note that one firing occurs in the time frame , where x=0 changes to x=1 at . The period of the oscillator's firing cycle can be calculated by inserting x=1 in the above equation giving . Thus, the natural firing frequency of the oscillator when uncoupled is
5. SYNCHRONIZATION AND SIMPLE FREQUENCY RATIOS
By virtue of the coupling, the two oscillators are able to synchronize or ‘mode lock’ (Schuster 1995; Coombes & Lord 1997) so that their firing patterns repeat with the same fixed period. Figure 3 shows time series of the two oscillators in a 2 : 3 mode-locked state. To understand the subtleties of mode locking in more detail, one needs to compare the ratio of the observed oscillator frequencies when coupled Δ1/Δ2 to the ratio of the oscillator's natural intrinsic frequencies Ω=ω1/ω2. The oscillators tend to mode lock to a simple firing ratio P : Q=Δ1/Δ2 which is close but not necessarily equal to the ratio of the oscillator's intrinsic frequencies Ω=ω1/ω2. The beauty of the synchronization is that the mode-locked state (e.g. 1 : 2) is stable to small changes in the frequencies ω1 or ω2 and thus Ω. In practice this means that should the intrinsic frequencies of the oscillators change slightly, the system's synchronized solution will nevertheless remain unaffected. This is demonstrated graphically in figure 4, where Ω=ω1/ω2 is varied; yet there are horizontal plateaus where the system's synchronized solution P : Q=Δ1/Δ2 stays unchanged.
Figure 4 gives simulation results showing the width of the interval ΔΩ for which the ratio Ω=ω1/ω2 may be changed while the mode-locked state P : Q remains constant. The vertical axis in figure 4 corresponds to the ratio of the observed frequencies of the coupled oscillators, namely P : Q=Δ1/Δ2, while the horizontal axis corresponds to the ratio Ω=ω1/ω2. The stability interval of 1 : 1 is marked by ΔΩ1, 1 : 2 by ΔΩ2 and 2 : 3 by ΔΩ3. The complete set of mode-locked states is referred to as a Devil's staircase (Schuster 1995) and is a universal feature of driven coupled oscillators. Note that the width of the mode-locked interval ΔΩ should be considered as an indicator of the structural stability of the synchronization. The wider the interval, the stronger the structural stability. Thus, for example, the unison (1 : 1) might be considered a more stable synchronization than the octave (1 : 2) since ΔΩ1>ΔΩ2. This correspondence between musical intervals and mode-locked states was previously sketched out in Stone (2000).
Table 1 shows a more detailed summary of the ordering of the stability index of the mode-locked states and reveals a correspondence with the theoretical ordering of musical intervals according to their consonance evaluation. The ordering corresponds to ratio simplicity discussed in Schellenberg & Trehub (1994b), where the simplest ratios (e.g. 1 : 1, 1 : 2, 2 : 3) are the most consonant. The ordering corresponds to that given in Helmholtz (1877, pp. 183 and 194) and Roederer (1975, p. 141, table 5.2) who regard it as having been accepted in the Western musical culture.
Theoretical arguments from a study of the generic ‘circle map’ also lead us to expect the relationship between the simplicity of the frequency ratio P : Q, and the width of the stability interval ΔΩ (Cvitanović et al. 1985). The relationship has been connected to a mathematical construct, the ‘Farey tree’, which orders all rational fractions P/Q in the interval [0,1] according to their increasing denominators Q (Cvitanović et al. 1985). As the circle map is a paradigmatic model for a large class of coupled oscillators the ordering of intervals by the stability index should be considered parameter independent in general.
6. Discussion
It should be noted that there may be more than one neural source that contributes to our perception of consonance and dissonance. Neural processing of auditory stimuli is complex, and it is possible that some combination of physical properties at the ear, primary auditory processing and secondary or associative processing play a role in this perception. Synchrony effects underlying these layers of complexity nevertheless may hold important clues in any attempt to explain consonance. Indeed, Cartwright et al. (2001) have explored a similar dynamical systems approach whereby the synchrony characteristics of three coupled oscillators (three frequency resonances), may resolve the puzzling perception of the ‘missing fundamental’. Their theory accounts for the manner in which a fundamental is mysteriously perceived in a set of tones played simultaneously, even though it is absent.
Having presented a theory of consonance and dissonance, it is important to emphasize that the effects we describe are intended to deal solely with pure-tone intervals outside of any musical context. This is to deliberately exclude the emotional component that is evoked when listening to harmonic musical progressions. Thus, the jazz musician might love hearing dissonance in music, but this phenomenon falls outside the scope of the theory presented here.
A selection of examples of consonant and dissonant sounds may be found in the electronic supplementary material.
Although Helmholtz's theory of beating harmonics is a delightful explanation for consonance and dissonance perception, as shown above, it nevertheless fails to account for many phenomena well known in the literature. In such cases, other explanations are needed. Partly owing to this, neural synchrony has in the past been postulated as an important mechanism in auditory perception (Boomsliter & Creel 1961; Palisca & Moore 2001). Palisca & Moore (2001) justify their ‘explanation in terms of the synchrony of neural impulses … [since it] is supported by the observation that both our sense of musical pitch and our ability to make octave matches largely disappear above 5 kHz, the frequency at which neural synchrony no longer appears to operate’ (Palisca & Moore 2001). The model presented here serves to extend their argument since it explains why human preference for simple frequency ratios in pure tones may be a natural consequence of neural synchronization.
7. Glossary
Pure tone is a single frequency tone with no harmonic content (no overtones). This corresponds to a sine wave. It is characterized by the frequency—the number of cycles per second and the amplitude of the cycles.
Complex tone is a combination of the fundamental frequency tone together with its harmonic components (its overtones). For a sine wave, the harmonics are integer multiples of the fundamental frequency of the wave. For example, if the fundamental frequency is f, the harmonics have frequency 2f, 3f, 4f, etc. Sounds produced from musical instruments are complex tones.
Pitch. A pitch is the perceived fundamental frequency of a tone.
Interval. In music theory, the term interval describes the difference in pitch between the fundamental frequencies of two notes. Intervals may be labelled according to the ratio of frequencies of the two pitches. Important intervals are those using the lowest integers, such as 1 : 1 (unison), 1 : 2 (octave), 2 : 3 (perfect fifth), 3 : 4 (perfect fourth), etc. as shown in table 1.
The ‘just intonation’ tuning (in which the frequencies of notes are related by ratios of integers) is the basic scaling method, but due to practical implementation difficulties on some musical instruments, the ‘equal temperament’ tuning was introduced (in which the octave (1 : 2) is divided into a series of equal steps).
Sonority is a term that refers to the quality of a musical tone. In particular, it refers to the resonance, richness or fullness of tone.
Acknowledgments
We thank David Earn for encouraging us over many years to finalize this work. We thank Bernd Blasius for initial simulations of an integrate–fire neuron model, and the helpful comments of four referees. We acknowledge the generous support of the James S. McDonnell Foundation and the Adams Super Center for Brain Studies.
Supplementary Material
References
- Blood A.J, Zatorre R.J, Bermudez P, Evans A.C. Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nat. Neurosci. 1999;2:382–387. doi: 10.1038/7299. [DOI] [PubMed] [Google Scholar]
- Boomsliter P, Creel W. The long pattern hypothesis in harmony and hearing. J. Music Theory. 1961;5:2–31. doi: 10.2307/842868. [DOI] [Google Scholar]
- Cartwright J.H.E, González D.L, Piro O. Pitch perception: a dynamical-systems perspective. Proc. Natl Acad. Sci. USA. 2001;98:4855–4859. doi: 10.1073/pnas.081070998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coombes S, Lord G.J. Intrinsic modulation of pulse-coupled integrate-and-fire neurons. Phys. Rev. E. 1997;56:5809–5818. doi: 10.1103/PhysRevE.56.5809. [DOI] [Google Scholar]
- Cvitanović P, Shraiman B, Söderberg B. Scaling laws of mode lockings in circle maps. Phys. Scr. 1985;32:263–270. doi: 10.1088/0031-8949/32/4/003. [DOI] [Google Scholar]
- Hartmann W.M. Springer; New York, NY: 1998. Signals, sound and sensation. [Google Scholar]
- Helmholtz H. Dover Publications; New York, NY: 1877. On the sensations of tone as a physiological basis for the theory of music. [Google Scholar]
- Itoh K, Suwazono S, Nakada N. Cortical processing of musical consonance: an evoked potential study. Neuroreport. 2003;14:2303–2306. doi: 10.1097/00001756-200312190-00003. [DOI] [PubMed] [Google Scholar]
- Jack J.J.B, Nobel D, Tsien R.W. Oxford Science Publications; New York, NY: 1975. Electric current flow in excitable cells. [Google Scholar]
- Mirollo R.E, Strogatz S.H. Synchronization of pulse-coupled biological oscillators. SIAM J. Appl. Math. 1990;50:1645–1662. doi: 10.1137/0150098. [DOI] [Google Scholar]
- Palisca C.V, Moore B.C.J. Consonance. In: Sadie S, editor. The New Grove dictionary of music and musicians. Oxford University Press; Oxford, UK: 2001. [Google Scholar]
- Peretz I, Blood A.J, Penhune V, Zatorre R. Cortical deafness to dissonance. Brain. 2001;124:928–940. doi: 10.1093/brain/124.5.928. [DOI] [PubMed] [Google Scholar]
- Plomp R, Levelt W.J.M. Tonal consonance and critical bandwidth. Acoust. Soc. Am. 1965;38:548–560. doi: 10.1121/1.1909741. [DOI] [PubMed] [Google Scholar]
- Roederer J.G. Springer; New York, NY: 1975. Introduction to the physics and psychophysics of music. [Google Scholar]
- Samson S, Zatorre R.J. Contribution of the right temporal lobe to musical timbre discrimination. Neuropsychologia. 1994;32:231–240. doi: 10.1016/0028-3932(94)90008-6. [DOI] [PubMed] [Google Scholar]
- Schellenberg E.G, Trehub S.E. Frequency ratios and the perception of tone patterns. Psychon. Bull. Rev. 1994a;1:191–201. doi: 10.3758/BF03200773. [DOI] [PubMed] [Google Scholar]
- Schellenberg E.G, Trehub S.E. Frequency ratios and the discrimination of pure tone sequences. Percept. Psychophys. 1994b;56:472–478. doi: 10.3758/bf03206738. [DOI] [PubMed] [Google Scholar]
- Schellenberg E.G, Trehub S.E. Children's discrimination of melodic intervals. Dev. Psychol. 1996a;32:1039–1050. doi: 10.1037/0012-1649.32.6.1039. [DOI] [Google Scholar]
- Schellenberg E.G, Trehub S.E. Natural musical intervals: evidence from infant listeners. Psychol. Sci. 1996b;7:272–277. doi: 10.1111/j.1467-9280.1996.tb00373.x. [DOI] [Google Scholar]
- Schuster H.G. VCH; Weinheim, Germany: 1995. Deterministic chaos. [Google Scholar]
- Stone, L. 2000 A nonlinear model of consonance and dissonance. Adams Super-Center for Brain Studies, report.
- Tenney J.A. Excelsior Music Publishing Co; New York, NY: 1988. History of consonance and dissonance. [Google Scholar]
- Tramo M.J, Cariani P.A, Delgutte B, Braida L.D. Neurobiological foundations for the theory of harmony in Western tonal music. Ann. NY Acad. Sci. 2001;930:92–116. doi: 10.1111/j.1749-6632.2001.tb05727.x. [DOI] [PubMed] [Google Scholar]
- Zatorre R.J, Evans A.C, Meyer E. Neural mechanisms underlying melodic perception and memory for pitch. J. Neurosci. 1994;14:1908–1919. doi: 10.1523/JNEUROSCI.14-04-01908.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.