Asymmetry in scales enhances learning of new musical structures

Claire Pelofi; Morwaread M Farbood

doi:10.1073/pnas.2014725118

. 2021 Jul 29;118(31):e2014725118. doi: 10.1073/pnas.2014725118

Asymmetry in scales enhances learning of new musical structures

Claire Pelofi ^a,^b,¹, Morwaread M Farbood ^a,^b,^c

PMCID: PMC8346874 PMID: 34326245

Significance

This study examines a fundamental aspect of human cognition: how listeners learn musical systems. It provides evidence that certain types of symmetry featured in musical scales help listeners process melodic and tonal information more easily. We propose that this cognitive benefit is the reason for the prevalence of unevenly spaced notes in scales across musical cultures. From a broader perspective, this work provides a cognitive perspective on the trade-off between cross-cultural diversity and fundamental similarity in a universal human activity, music.

Keywords: universals, syntactic learning, expectancies, musical scale, musical cultures

Abstract

Despite the remarkable variability music displays across cultures, certain recurrent musical features motivate the hypothesis that fundamental cognitive principles constrain the way music is produced. One such feature concerns the structure of musical scales. The vast majority of musical cultures use scales that are not uniformly symmetric—that is, scales that contain notes spread unevenly across the octave. Here we present evidence that the structure of musical scales has a substantial impact on how listeners learn new musical systems. Three experiments were conducted to test the hypothesis that nonuniformity facilitates the processing of melodies. Novel melodic stimuli were composed based on artificial grammars using scales with different levels of symmetry. Experiment 1 tested the acquisition of tonal hierarchies and melodic regularities on three different 12-tone equal-tempered scales using a finite-state grammar. Experiments 2 and 3 used more flexible Markov-chain grammars and were designed to generalize the effect to 14-tone and 16-tone equal-tempered scales. The results showed that performance was significantly enhanced by scale structures that specified the tonal space by providing unique intervallic relations between notes. These results suggest that the learning of novel musical systems is modulated by the symmetry of scales, which in turn may explain the prevalence of nonuniform scales across musical cultures.

Music is a fascinating phenomenon from the perspective of human cognition: It is ubiquitous in human cultures (1, 2) and serves a role in everyday life for most individuals (3) but displays extraordinary variety in its timbral (4), rhythmic (5), and tonal characteristics (6) as well as social functions (7–9). The question of whether music possesses universal properties is a debated topic. The claim that music exhibits universal features has long faced skepticism (10–12), but in the light of past studies, it appears increasingly plausible (6, 13–19).

Until recently, discussions of musical universals were largely uninformed by systematically gathered, quantitative data. However, a growing body of ethnomusicological data (6, 20), together with the modern tools of statistical analysis inspired by evolutionary biology (21), have recently shed light on the question of musical universals. By quantitatively determining the prevalence of particular musical features (6, 19, 22), this line of research has confirmed that music is found in every culture, fulfills social needs, and exhibits what Western listeners have identified as a form of tonality (i.e., the existence of a tonal center among the set of discrete pitches) (6).

More specifically, a certain property of musical scales has been reported to constitute one of the main recurrent musical features across cultures. Savage et al. (19) examined 32 candidate features of “musical universality” and found that “nonequidistant scales”—or scales that are composed of intervals of different sizes—constitute the second-most common feature. In line with this is the observation that one of the oldest instruments ever found, a bone flute presumably constructed by Neanderthals 43,000 y ago, was designed to play unevenly spaced scale notes (23). In contrast, “equal-step” scales have been observed extremely rarely across musical cultures, but most notably in the famous Slendro scale from Java (18), in some Western art music by Claude Debussy and other 20th-century composers (24), and in Western serial compositions using the 12-tone chromatic scale (25).*

The pervasiveness of certain features in an activity as ubiquitous as music can be explained by the interaction of cross-cultural influences and the existence of sensory and cognitive constraints that shape the way music is enjoyed and hence produced. This study aims to examine those cognitive constraints for the particular case of musical scales. Music theorists have noted that common scales, such as the pentatonic scale and Western diatonic scale, display certain structural properties regarding the positioning of the tones along the octave that could promote the processing of musical information—in particular with respect to locating a tonal center (27, 28). Most known scales display intervals of different sizes (whole tones and semitones in the Western system), which are positioned around the octave in a way that maximizes uniqueness through intervallic, nonuniform structures (29). A pitch set satisfies the property of uniqueness if “each of its elements has a unique set of relations with the others and therefore has the potentiality for a unique musical role or ‘dynamic quality’” (ref. 30, p. 8). In turn, scale structures fulfilling uniqueness could enhance “position finding”—that is, having a sense of the position of each tone relative to other tones (27). As noted by Balzano (29), the Western diatonic scale possesses an axis of symmetry (i.e., the sequence of intervals formed by starting at a particular tone and moving both clockwise and counterclockwise around the circle is identical); however, it still fulfills the uniqueness property due to the directional nature (low/high) of pitch perception. This type of symmetry—which we term reflective symmetry—and the concept of uniqueness have also been reported in rhythms, mostly from West Africa (31), where they might serve a similar purpose of cognitive facilitation of listener orientation in a cyclical space. In contrast, the whole-tone scale and other uniform scales do not satisfy the uniqueness property because they are completely symmetric. Finally, intermediate levels of uniqueness can be found in certain scales that are transpositionally limited (32) and thus display rotational symmetry. Balzano (29) remarks that with regard to these scales, “it is the very symmetry and apparent elegance of the set that is its undoing with respect to Uniqueness” (ref. 29, p. 326).

Despite this rich theoretical literature on intervallic structures in scales and their potential to facilitate cognitive processing of music, only one previous study to our knowledge has explored the cognitive advantage of certain scale features from an experimental psychology perspective. Using an out-of-tune note detection task, Trehub et al. (33) showed that it was easier for infants to detect out-of-tune notes in the context of an unfamiliar nonequal step scale compared to an unfamiliar equal step scale. This effect was not shown for adults, although both infants and adults were better at detecting out-of-tune notes in the context of a familiar nonequal step scale (the Western major scale). Although detecting out-of-tune notes is an important aspect of music perception, other processes, such as encoding a tonal hierarchy or syntactic regularities from a set of melodies, constitute higher-level core elements of music processing (34).

A tonal hierarchy results from a distribution of notes that creates the perception that certain tones are more prominent, stable, or structurally significant (35–37). This representation is developed early in life through musical exposure (38, 39) but can also occur rapidly and incidentally for unfamiliar musical systems (40, 41). Previous studies suggest that the brain is able to encode a set of rules that define well-formed melodies from note sequences displayed by a corpus of melodies (42). Loui et al. (43) demonstrated that listeners can learn syntactic regularities and develop melodic expectancies from a highly unconventional musical system (featuring nonoctave-repeating scales) after only a short exposure period. However, it has been noted that some features in melodic structures, such as those identified by Narmour’s implication–realization model (44), may facilitate the learning of unfamiliar musical systems and the formation of melodic expectancies, suggesting that constraints are shaping musical systems (45).

In this study, we tested the hypothesis that the intervallic structure of musical scales also affects the ease with which listeners acquire knowledge of unfamiliar musical systems. Given the rarity of uniformly symmetric scales in musical cultures, we hypothesized that this specific intervallic structure may serve as an impediment to ease of music processing. Specifically, we sought to examine the effect of intervallic uniqueness by testing four different types of symmetry structures that entail various degrees of uniqueness (29). Asymmetric scales (examined in experiments 1, 2, and 3) and reflective–symmetric scales (examined in experiment 2) both satisfy the uniqueness property. Rotational–symmetric scales (experiments 1 and 3) possess an intermediate level of uniqueness: Each tone has a unique set of relations with only some of the other tones. Uniform–symmetric scales (experiments 1, 2, and 3) fail to satisfy the uniqueness property. We hypothesized that musical systems built from asymmetric and reflective–symmetric scales would result in better learning of musical structure than those from the rotational–symmetric and uniform–symmetric scales because the former fully satisfy the uniqueness property. By contrasting listener performance linked to melodic processing (namely tonal hierarchy and melodic regularity learning) as a function of intervallic organization, this work sought to demonstrate that cognitive constraints actively shape the structure of musical scales (46).

Results

Three experiments were conducted to test our hypothesis concerning the facilitating effects of nonuniform scales. All three experiments explored listener processing of artificial, unfamiliar musical systems using scales with three different levels of symmetry: One scale was uniformly symmetric, one was either rotationally or reflectively symmetric, and the last scale was asymmetric. Experiments 2 and 3 addressed certain problems with the stimulus design of experiment 1, expanded the musical systems beyond standard 12-tone equal temperament (12-TET), and streamlined the experimental procedure. The “middle” symmetry condition varied across experiments; it was rotational in experiments 1 and 3 and reflective in experiment 2. Schematic diagrams of the scales used in all three experiments are shown in Fig. 1.

Fig. 1. — Schematic representations of experiment scales. (A) Circular diagrams of the three 12-tone equal-tempered (12-TET) scales used in experiment 1. (B) Circular diagrams of the six scales used in experiment 2. *Top* shows 12-TET scales. *Bottom* shows 14-TET scales. (C) Circular diagrams of the six scales used in experiment 3. *Top* shows 12-TET scales. *Bottom* shows 16-TET scales.

The general paradigm for the experiments consisted of presenting exposure melodies (all composed of pure tones) to participants to provide them an opportunity to implicitly learn features of the musical systems. The general assumption was that listeners would demonstrate enhanced learning in proportion to the degree of uniqueness of the scale. In experiments 2 and 3, listeners were asked to indicate whether a melody sounded familiar or not after the exposure phase. The design of experiment 1 was more complex and included a probe-tone task and a set of recognition–generalization tasks to assess how much of the novel musical structure listeners had learned from the exposure phase. Both recognition and generalization tasks were included in experiment 1 to match the experimental design of prior work (43). The different types of tasks corresponded to the three aspects of music cognition that were examined in experiment 1: tonal hierarchy, melodic memory, and melodic expectation. Tonal hierarchy corresponds to the fact that listeners, when exposed to a musical system, build representations of the relative fit of each note in the scale (47). This can be observed with a probe-tone task, where listeners provide “goodness-of-fit” ratings for probe tones preceded by a tonal context (36). Here, an exposure phase presented a set of melodies in which tones had a specific density distribution. This exposure phase was bookended by probe-tone ratings that could demonstrate how much participants had learned about the distribution of pitches in the exposure phase and how that learning was affected by the scale structure.

The recognition task tested memory for novel melodies composed from different scale structures. While we expected overall poor performance (48, 49), we hypothesized that memory performance could be modulated by the type of scale, favoring those that allow for a better encoding of intervallic relationships (50). The aspect of music cognition explored through the generalization task in experiment 1 and experiments 2 and 3 as a whole concerned the processing of regularities and the formation of melodic expectations (using artificial grammars; see Fig. 2). The ability to predict melodic sequences has been shown to play a central role in music perception across different cultures (51, 52) and has been successfully tracked from cortical activity (53). After exposure to melodies composed in a novel musical system, listeners were presented with melodies that were either consistent or inconsistent with the grammar of the system. In experiment 1, these melodies were presented in pairs and listeners had to choose the one that sounded more familiar. In experiments 2 and 3, melodies were presented individually and listeners had to report whether they were familiar or not.

Fig. 2. — Schematic representation of the grammars. (A) Experiment 1: The finite-state grammar designed to create melodies with a set of possible, equally probable transitions between notes of the scale. In this schematic representation, numbers refer to the position of notes in the scale. (B) The first-order Markov-chain grammar used to generate 12-TET melodies from six-note scales in experiments 2 and 3. Nodes represent scale notes; gray and green arrows represent the transitions between nodes used to generate exposure melodies; and red arrows represent two possible examples of “incorrect” transitions used to generate half of the test melodies. (C) The first-order Markov-chain grammar used to generate 14-TET melodies from seven-note scales in experiment 2. (D) The first-order Markov-chain grammar used to generate 16-TET melodies from eight-note scales in experiment 3. (E) Musical notation for two example melodies generated by a simplified implementation of the 12-TET Markov-chain grammar containing only correct transitions (*Left*) and one incorrect transition (*Right*).

Experiment 1.

Probe-tone task.

Listeners provided ratings for how well (on a seven-point Likert scale) each note in the Western chromatic scale (12-TET) fit in the context of each of the three test scales (asymmetric, rotational–symmetric, and uniform–symmetric). Each probe tone was presented twice in both the pre- and postexposure sessions. Preceding each tone was a sequence of all of the scale tones played in ascending or descending order. Fig. 3 shows the probe-tone fitness ratings averaged across participants for each pitch and each scale condition and for pre- and postexposure phases.

At first glance, preexposure, postexposure, and pitch density profiles (Fig. 3) suggest that 1) participants were mostly able to differentiate between probe tones that were part of the scale and those that were not and 2) responses differed after the exposure phase. Average ratings from the preexposure phase (lighter shade, dotted line) suggest that listeners were able to extract this information already during the preexposure phase using the context presented.

An ordinal mixed-effects model analysis was conducted to determine the factors that predicted fitness ratings. The model included participant as a random effect and session (pre- or postexposure), scale (uniform–symmetric, rotational–symmetric, asymmetric), and probability (of the probe tone) as fixed effects. To make the model more interpretable, the probability values were rescaled from 0 to 1 to 0 to 10 so that a one-unit increase was equivalent to a 10% increase in probability. The results of the ordinal model can be found in Table 1. The baseline references for the two categorical factors are preexposure for session and uniform–symmetric for scale. The most intuitive result is the proportional odds ratio ( $e^{β}$ in Table 1) associated with probability, which indicated that for every 10% increase in probe-tone probability, the odds of giving a higher fitness rating increased by 266%. This relationship is also governed by the fact that pitch probabilities derived from the exposure melodies would be expected to affect only postexposure ratings, something that is evident in the significant interaction between session and probability: The odds of a one-point increase in fitness rating increased by 57% in the postexposure session. Furthermore, the odds of a fitness rating being one point higher increased by 64% for asymmetric scale trials compared to uniform–symmetric trials (there was no significant difference found between rotational–symmetric and the other two scale conditions). A detailed account of the model evaluation is provided in Materials and Methods.

Table 1.

Ordinal mixed-effects model for probe-tone task

Variable	Variance	SD	$β$	SE $β$	$Z$	$P$	$e^{β}$ (POR)	95% CI
Random effects
Subject	0.56	0.75
Fixed effects
Session[PostExp]			−0.342	0.197	−1.741	0.082	0.71	[0.48, 1.04]
Scale[Asymmetric]			0.495	0.195	2.536	0.011*	1.64	[1.12, 2.40]
Scale[RotationalSymmetric]			0.051	0.194	0.265	0.791	1.05	[0.72, 1.54]
Probability			0.980	0.118	8.272	>0.001*	2.66	[2.11, 3.36]
Session[PostExp]*Scale[Asymmetric]			−0.426	0.276	−1.541	0.123	0.65	[0.38, 1.12]
Session[PostExp]*Scale[RotationalSymmetric]			−0.077	0.280	−0.276	0.783	0.93	[0.53, 1.60]
Session[PostExp]*Probability			0.449	0.171	2.635	0.008*	1.57	[1.12, 2.19]
Scale[Asymmetric]*Probability			0.004	0.170	0.026	0.979	1.00	[0.72, 1.40]
Scale[RotationalSymmetric]*Probability			-0.197	0.166	−1.183	0.237	0.82	[0.59, 1.14]
Session[PostExp]Scale[Asymmetric]Probability			0.073	0.244	0.298	0.765	1.08	[0.67, 1.74]
Session[PostExp]Scale[RotationalSymmetric]Probability			0.013	0.241	0.055	0.956	1.01	[0.63, 1.62]

Open in a new tab

Number of observations = 1,727. PostExp, postexposure; probability, probability of probe tone; CI, confidence interval; POR, proportional odds ratio.*P $<$ .05.

Recognition–generalization task.

Mean $d^{'}$ values for each scale condition broken down by task type are shown in Fig. 4. There were two types of trials, one testing recognition and the other testing generalization. In each trial, one of the melodies presented was taken from the exposure phase. The other was either a new melody generated from a different grammar (generalization task) or a new melody generated from the original grammar (recognition task). Participants were asked to choose whether the first or the second melody sounded more familiar. Listeners performed poorly on the recognition task, as indicated by the low $d^{'}$ values in all symmetry conditions.

A generalized linear mixed-effects model (GLMM) with a probit link function was used to further analyze the data. To utilize a signal detection theory paradigm, the actual response (as opposed to correct/incorrect) served as the binomial dependent variable (labeled as ChoseSecond, indicating whether the first or the second melody was chosen as familiar). The glmer function in the R package lme4 was used to implement the model. The model included participant as a random effect and scale, IsSecond (whether or not the familiar melody was presented second), and task (generalization, recognition) as fixed effects. The baseline references for the model were Scale[UniformSymmetric] and Task[Generalization]. The results are shown in Table 2. A detailed account of the model evaluation is provided in Materials and Methods.

Table 2.

Mixed-effects model for the recognition–generalization task

Variable	Variance	SD	$β$	SE $β$	$Z$	$P$	95% CI
Random effects
Participant	0.003	0.057
Fixed effects
Intercept			−0.002	0.029	−0.07	0.945	[−0.06, 0.06]
IsSecond			0.611	0.048	12.68	>0.001*	[0.52, 0.71]
Scale[Asymmetric]			−0.032	0.059	−0.55	0.585	[−0.15, 0.08]
Scale[RotationalSymmetric]			−0.006	0.059	−0.10	0.919	[−0.12, 0.11]
Task[Recognition]			0.096	0.048	1.98	0.048*	[0.001, 0.19]
IsSecond*Scale[Asymmetric]			0.494	0.118	4.20	>0.001*	[0.26, 0.72]
IsSecond*Scale[RotationalSymmetric]			0.387	0.117	3.30	>0.001*	[0.16, 0.62]
IsSecond*Task[Recognition]			−0.931	0.097	−9.63	>0.001*	[−1.12, −0.74]

Open in a new tab

Number of observations = 2,880; $C$ statistic = 0.67; Somers’ $D_{x y} = 0.35$ . *P $<$ .05.

The intercept corresponds to a measure of the observer’s criterion $c$ or response bias. In this case, it was not significant, indicating that listeners were not inclined to choose either the first or the second melody more often. The IsSecond coefficient of 0.61 corresponds to sensitivity or overall $d^{'}$ for the task. The Scale[Asymmetric] and Scale[RotationalSymmetric] coefficients correspond to bias for choosing either the first or the second melody more often for their respective conditions, and neither one was significant. There was a significant effect of Task[Recognition], indicating there was slightly less bias for choosing the second melody in the recognition task compared to the generalization task; in other words, there was a relatively smaller proportion of false alarms to misses in the recognition condition. The interactions are of primary interest in this model: They indicated that $d^{'}$ s for the asymmetric and rotational–symmetric scale conditions were both significantly higher than for the uniform–symmetric condition. However, there was no significant difference between the asymmetric and rotational–symmetric conditions (this was determined when Scale[Asymmetric] was used as the baseline reference instead of Scale[UniformSymmetric]). The last interaction, between IsSecond and Task[Recognition], was also significant and indicated that sensitivity for the recognition task was significantly lower than for the generalization task; in other words, participants found the recognition task more difficult.

Experiment 1 produced promising results, but the methodology raised a number of questions, which led to some unresolved problems. First, the scales were all derived from the 12-TET tuning system on which Western music is based. Previous studies suggest that after a short exposure, listeners are able to learn new intervals based on non–12-TET musical systems (55, 56). Subsequently, experiments 2 and 3 featured scales based on 14-TET and 16-TET systems to extend the main findings of scale structure effect to non-Western tonal systems and to reduce any unwanted effects resulting from scale familiarity.

Another possible confound in experiment 1 was an artifact of how the melodies were generated: The same grammar with fixed notes/nodes combination was used for all participants and, in the process, produced a repeated set of distinct and fixed melodic intervals. Some intervals, such as the tritone (six semitones) or perfect fifth (seven semitones), are highly distinctive or familiar and could potentially influence melodic learning. The transition between notes 1 and 4 in the grammar was a tritone for all three scales, and the transition between notes 2 and 5 was a tritone for the uniform–symmetric and rotational–symmetric scales. The perfect fifth was present between notes 2 and 5 of the asymmetric scale and notes 1 and 5 of the rotational–symmetric scale but was never present in melodies derived from the uniform–symmetric scale. Given the relative (yet not systematic) prevalence of consonant intervals across musical cultures (13), this also could have potentially facilitated melodic learning in the asymmetric and rotational–symmetric cases.

Altogether, the way the grammar was implemented in experiment 1 created a fixed distribution of intervals that consisted of frequent and distinct melodic leaps that might have served as more obvious “signposts” for recognition and generalization rather than the scale structure itself. In experiments 2 and 3, the note/node combinations in the grammar were systematically randomized to produce a varied set of intervals for each participant, thus preventing unwanted effects from recurring intervals. Another issue was that the finite-state grammar used in experiment 1 produced noticeably unmusical melodies. In experiments 2 and 3, this problem was mitigated by using a first-order Markov chain to generate melodies, resulting in more complex transition probabilities that in turn yielded more ecologically valid melodies. Finally, the parallel designs of experiments 2 and 3 allowed for more direct comparisons between rotational and reflective symmetry in addition to the baseline uniform–symmetric and asymmetric comparisons.

Experiment 2.

Similar to the experiment 1 recognition–generalization task, scales with three levels of asymmetry were used to generate melodies in experiment 2. In this experiment, the middle symmetry condition differed, featuring reflective symmetry instead of rotational symmetry (Fig. 1B). Additionally, two different tuning systems were used, 12-TET and 14-TET, resulting in six different scales used to generate melodies for six separate exposure/test sessions that took place on 2 different days (one for each tuning system). Similar to experiment 1, an exposure phase preceded a test phase where participants were presented with new melodies derived from either the exposure grammar or a slightly different one (Fig. 2 B and C). Listeners were asked to report whether each test melody seemed familiar or unfamiliar with respect to what they had heard in the exposure phase. The mean $d^{'}$ values for all symmetry levels in each tuning system are shown in Fig. 5A. In the case of both tuning systems, the asymmetric condition had the highest $d^{'}$ values and the uniform–symmetric had the lowest.

Fig. 5. — Results of experiments 2 and 3. $d^{'}$ values (calculated independently of the models) are averaged across participants by symmetry condition: asymmetric (blue), reflective–symmetric (purple), rotational–symmetric (red), and uniform–symmetric (yellow). Error bars correspond to standard error. (A) Experiment 2 $d^{'}$ values for the 12-TET and 14-TET scales. (B) Experiment 3 $d^{'}$ values for the 12-TET and 16-TET scales.

As in the analysis for the recognition–generalization test in experiment 1, a GLMM with response as the dependent variable was used to analyze the data in experiment 2. The model included participant as a random effect and IsFamiliar (familiar grammar or new grammar), scale (uniform–symmetric, reflective–symmetric, asymmetric), and tuning (12-TET or 14-TET) as fixed effects. The results of the model are shown in Table 3. The baseline references for the model were Scale[ReflectiveSymmetric], and Tuning[12TET]. A detailed account of the model evaluation is provided in Materials and Methods.

Table 3.

Experiment 2 results

Variable	Variance	SD	$β$	SE $β$	$Z$	$P$	95% CI
Random effects
Participant	0.141	0.375
Fixed effects
Intercept			0.306	0.083	3.68	>0.001*	[0.14, 0.48]
IsFamiliar			1.113	0.028	39.86	>0.001*	[1.06, 1.17]
Scale[UniformSymmetric]			−0.081	0.033	−2.44	0.015*	[−0.15, −0.02]
Scale[Asymmetric]			−0.096	0.034	−2.80	0.005*	[−0.16, −0.03]
Tuning[14TET]			0.135	0.028	4.88	>0.001*	[0.08, 0.19]
IsFamiliar*Scale[UniformSymmetric]			−0.271	0.067	−4.07	>0.001*	[−0.40, −0.14]
IsFamiliar*Scale[Asymmetric]			0.368	0.069	5.35	>0.001*	[0.23, 0.50]
IsFamiliar*Tuning[14TET]			−0.005	0.055	−0.09	0.930	[−0.11, 0.10]

Open in a new tab

Number of observations = 10,080; $C$ statistic = 0.78; Somers’ $D_{x y} = 0.56$ . *P $<$ .05.

The intercept was significant, indicating participants were generally biased toward choosing “familiar” as a response. The coefficient for IsFamiliar (1.11) corresponds to the overall $d^{'}$ . The corresponding values for Scale[Asymmetric], Scale[Uniform], and Tuning[14TET] were also significant, indicating there was slightly less bias in responses to melodies in the asymmetric and uniform–symmetric scale conditions compared to reflective–symmetric and more bias in the 14-TET condition compared to 12-TET. However, there was no significant difference in sensitivity between the two tuning conditions. The significant interactions show that there were differences in sensitivity between all scale conditions, with uniform–symmetric having the lowest $d^{'}$ , asymmetric the highest, and reflective–symmetric in between. Consistent with experiment 1, there is again a significant difference between the uniform–symmetric and the two other conditions.

Experiment 3.

Experiment 3 was conducted as a complement to experiment 2. The procedure was identical to experiment 2, but it featured a rotational–symmetric scale instead of a reflective–symmetric one in addition to the uniform–symmetric and asymmetric scales (similar to experiment 1; Fig. 1C). The nonstandard tuning condition in experiment 3 was 16-TET instead of 14-TET. This change was due to the fact that it was not possible to create a rotational–symmetric scale that matched the seven-note uniform–symmetric scale in 14-TET. The 16-TET system allowed for eight-note uniform–symmetric and rotational–symmetric scales. The grammars used to generate the melodies in experiment 3 are shown in Fig. 2 B–D).

The mean $d^{'}$ values for all symmetry levels in each tuning system are shown in Fig. 5B. Again, for both tuning systems, the asymmetric condition had the highest $d^{'}$ values. In the case of the 12-TET scales, the rotational–symmetric $d^{'}$ was higher than uniform–symmetric, while for the 16-TET scales, it was the reverse. The same mixed-effects model from experiment 2 was used to analyze the data from experiment 3. The model included participant as a random effect and IsFamiliar (familiar grammar or new grammar), scale (uniform–symmetric, rotational–symmetric, asymmetric), and tuning (12-TET or 16-TET) as fixed effects. The results of the model are shown in Table 4. The baseline references in this case were Scale[Asymmetric] and Tuning[12-TET]. A detailed account of the model evaluation is provided in Materials and Methods.

Table 4.

Experiment 3 results

Variable	Variance	SD	$β$	SE $β$	$Z$	$P$	95% CI
Random effects
Participant	0.181	0.425
Fixed effects
Intercept			0.481	0.096	5.01	>0.001*	[0.28, 0.68]
IsFamiliar			0.785	0.029	27.43	>0.001*	[0.73, 0.84]
Scale[UniformSymmetric]			0.050	0.035	1.43	0.152	[−0.02, 0.12]
Scale[RotationalSymmetric]			0.042	0.035	1.20	0.230	[−0.03, 0.11]
Tuning[16TET]			−0.071	0.028	−2.50	0.012*	[−0.127, −0.02]
IsFamiliar*Scale[UniformSymmetric]			−0.242	0.070	−3.48	>0.001*	[−0.38, −0.11]
IsFamiliar*Scale[RotationalSymmetric]			−0.228	0.070	−3.28	>0.001*	[−0.36, −0.09]
IsFamiliar*Tuning[16TET]			−0.287	0.057	−4.04	>0.001*	[−0.40, −0.18]

Open in a new tab

Number of observations = 9600; $C$ statistic = 0.74; Somers’s $D_{x y} = 0.48$ . CI = confidence interval. *P $<$ .05.

The intercept was significant, indicating participants were biased toward choosing “familiar” as a response. The overall $d^{'}$ was 0.79, which is lower than for experiment 2. There was no significant difference in bias between any of the scale conditions (including between Scale[RotationalSymmetric] and Scale[Uniform], which was determined by using Scale[UniformSymmetric] as a baseline). There were significant differences in both bias and sensitivity between the two types of tuning: less bias and a marked decrease in sensitivity for 16-TET. The other interactions indicate that there was significantly higher sensitivity for the asymmetrical scale compared to both the uniform–symmetric and rotational–symmetric scales. There was no significant difference in sensitivity between the uniform–symmetric and rotational–symmetric conditions (again determined by using Scale[UniformSymmetric] as a baseline).

Discussion

The findings we present suggest that the prevalence of nonuniform scales across musical cultures may be grounded in the cognitive benefits it has on the position finding of tones (27, 28). Butler and Brown (28) suggest that the intervallic structure contained in scales provides perceptual anchors that help listeners orient in a tonal circular space (35) and that these anchors are manifested in the balance between rare and common intervals. This could be an epiphenomenon of the uniqueness property of a scale; each tone in such a scale has a unique set of relations with the others and therefore has a potentially unique musical role (30). Balzano noted that “a melody based on a scale satisfying Uniqueness should be easier for a perceiver to deal with because the notes of the melody are individuated not only by their particular frequency locations, but also by their interrelations with one another” (ref. 29, p. 326).

The results of the current work suggest that the uniqueness property in musical scales does indeed enhance performance on a task central to music cognition: melodic expectancy encoding. This study also explored the impact of scale structure on tonal hierarchy perception. When listening to music, people are exposed to statistical distributions of musical events (e.g., notes or chords) and implicitly build a representation of the hierarchical system of tones (47). These representations are acquired rapidly when listeners are exposed to a new musical system (40, 41, 57), even when the system’s structural properties are not found in any referenced musical culture (43). Results from the probe-tone task in experiment 1 are consistent with these previous findings. We observed that listeners were able to extract some information about the tonal structure from hearing the scale alone, even before the exposure phase. The pre- and postexposure ratings appeared to improve more for the asymmetric scale, but these results were not strong enough to make the claim that this difference stemmed from a higher degree of uniqueness.

The results of the recognition–generalization test in experiment 1 showed that overall performance was better for the two nonuniform scales compared to the uniform one. This was driven in part by the strong performances on the generalization task. The recognition task results, on the other hand, revealed that listeners did not form robust memories of particular melodies. This is consistent with results reported in Loui et al. (43) (second experiment), who noted that musicians performed poorly on the recognition task in comparison to the generalization task. In the case of the asymmetric scale, performance on the recognition task significantly differed from chance level. This might have resulted from either the structure of the scale or the particular set of intervals defined by both the scale and the grammar. In other words, it may be the case that melodies generated from the asymmetric scale displayed a pattern of intervals that was easier to memorize. Dowling’s model of memory for melodies proposes that the formation of robust representations of melodies depends both on scale structure and the set of intervals (50). Dowling and Barlett (58) also demonstrated that intervals are better stored in long-term memory than other features of melodic sequences. This possible confound motivated the design of experiments 2 and 3, in which sets of intervals were randomly generated for each participant to better assess the effect of scale structure alone.

The results of experiments 2 and 3 supported the hypothesis that the uniqueness property was a facilitating factor in learning musical structure. When listeners were asked to identify whether melodies generated in the exposure grammar or a new grammar sounded familiar or not, their performance on the task was significantly better for the scales that exhibited intervallic uniqueness (asymmetric and reflective–symmetric) than for the scales that did not meet or only partially met the criterion (uniform–symmetric and rotational–symmetric). Furthermore, this pattern of performance for asymmetric and reflective–symmetric was similar regardless of the tuning context (12-TET vs. 14-TET); when a more complex model that included a three-way interaction between tuning, response, and scale condition was tested, it did not fit the data better than the model that included only interactions between response and the other factors (see Materials and Methods for details on model evaluation). The difference between reflective–symmetric and asymmetric observed in experiment 2 could originate from the associations that listeners make between pitch patterns and their inversions (59). It might be more difficult to orient in a space that is identical when rotating clockwise and counterclockwise (similar to reflective symmetry). Listeners identify intervallic patterns with their inversions implicitly, and this type of association increases with musical training (60, 61). Finally, the overall lower sensitivity observed in experiment 3, mostly driven by $d^{'}$ in the 16-TET scales, could result from the higher number of tones present in these octatonic scales.

Making predictions about upcoming events is a crucial cognitive ability shared across multiple cognitive domains (62). Music provides a unique perspective on how the mind extracts regularities from past input to form expectations about future events. This learning occurs through passive exposure (63) and is critical to a listener’s enjoyment of music; the interplay between the fulfillment and frustration of expectations is believed to give rise to emotional responses to music (62, 64). The rules recruited to produce musical predictions are derived from the joint contributions of Gestalt principles, shared with other cognitive domains, and statistical learning resulting from an individual’s specific exposure (65). For the most part, accurate predictions rely on a stable and well-characterized tonal space that allows listeners to orient in the inherently cyclical space of tonality resulting from octave equivalence (66). Here, we provided evidence that nonuniform scales and more precisely their uniqueness property support perceptual anchoring by specifying the tonal space in which listeners are immersed. An analogy to clarify this interpretation would be to imagine people placed in a circular room where they must remember a specific path between points distributed around this circular space. It would be easier if the points were organized in a manner such that their relation to all of the other points was uniquely defined; this would mean that the entire space could be anchored relative to a specific point (analogous to the tonic in a musical context).

However, there could be other factors beyond the scope of this study that facilitate processing of musical structure that in turn might explain the prevalence of the uniqueness property in existing scales. For example, the presence of consonant intervals might play an important role (67) or there might be spectral qualities of pitch combinations that influence scale structures (68, 69). Maximal evenness of scales, a music–theoretic principle that describes the maximal spreading of tones around the octave (a concept not equivalent to uniform symmetry) (70), could also play a facilitating role and has also been observed in musical rhythms (71). Finally, the balance between common and rare intervals (respectively, the perfect fourth and the tritone in the Western diatonic scale) could enhance position finding (27, 28).

Overall, these results shed further light on the thorny question of how basic cognitive and sensory constraints intersect with cultural influences and result in universal features in human production (72, 73). From a cross-domain perspective, Chomsky’s (74) claim that innate principles underlie the many manifestations of language (the concept of a universal grammar) provided a framework for such a question and, in the process, ignited an unprecedented scientific explosion in the field of human cognition. The discovery of structural universals in language, such as the architecture of phonological structures (75), has tremendously impacted our understanding of the human mind (76). In the domain of music cognition, such efforts have been more limited but will surely benefit from the recent development of corpus-based statistical analysis (6, 19).

One example of how innate, sensory principles shape musical features concerns how the smallest intervals found between structural tones (e.g., semitones) (13) reflect typical frequency discrimination abilities (77). More recently, Jacoby and McDermott (16) found that listeners from different cultural backgrounds produce complex rhythms converging on integer–ratio temporal intervals. On the other hand, this line of research can also demonstrate the opposite—that seemingly pervasive elements of musical structure fail to point to universal aspects of perception and production (78, 79).

Ultimately, the difficulty in identifying putative musical universals lies in the definition of what is considered “universal.” Strictly speaking, no properties of language or music are universal per se; in the absence of known counterexamples, it is not prudent to claim that such exceptions do not exist or have never existed. The definition of universality then depends on one’s view of exceptions or, in other words, whether they invalidate or support a rule. Ellis’s research on tonality (80) is a good example of how different interpretations can be drawn from empirical evidence, based on how exceptions on universality are viewed. Ellis observed tunings for dozens of instruments from various cultures in Europe and Asia. He found that the intervals of an octave, a fifth, and a fourth were present in all musical tuning systems with the exception of instruments in Java. He thus concluded that this exception invalidated the idea of universal intervallic preferences. Although this exception was worth reporting and echoes recent empirical findings (78), this does not mean that the striking prevalence of octaves, fifths, and fourths across musical cultures should be entirely disregarded (81). The concept of a “statistical universal” (also referred to as “typological generalization” in language) is preferable, as it offers the flexibility needed when working with empirical data (19, 82).

The results of the current study provide evidence that the statistical universality of certain scale structures may result from a cognitive advantage in processing complex melodic patterns and encoding musical regularities. The symmetry properties discussed here merit serious consideration as a cognitive constraint that defines structural aspects of music. Although more behavioral and neurophysiological studies would be needed to further strengthen this hypothesis, the current findings strongly suggest that nonuniformity, which might enhance the position finding of tones in an octave diatonic context, provides cognitive benefits that could explain its pervasiveness. This study opens an avenue of research in the field of music cognition that should encourage systematic empirical investigations of recently identified statistical universals.