Form and function in human song

Samuel A Mehr; Manvir Singh; Hunter York; Luke Glowacki; Max M Krasnow

doi:10.1016/j.cub.2017.12.042

. Author manuscript; available in PMC: 2019 Feb 5.

Published in final edited form as: Curr Biol. 2018 Jan 25;28(3):356–368.e5. doi: 10.1016/j.cub.2017.12.042

Form and function in human song

Samuel A Mehr ^1,^2,^3,^7,^8,^*, Manvir Singh ^4,^8,^*, Hunter York ⁴, Luke Glowacki ^5,⁶, Max M Krasnow ¹

PMCID: PMC5805477 NIHMSID: NIHMS930800 PMID: 29395919

Summary

Humans use music for a wide variety of social functions: we sing to accompany dance, to soothe babies, to heal illness, to communicate love, and so on. Across animal taxa, vocalization forms are shaped by their functions, including in humans. Here we show that vocal music exhibits recurrent, distinct, and cross-culturally robust form-function relations detectable by listeners across the globe. In Experiment 1, internet users (N = 750) in 60 countries listened to brief excerpts of songs, rating each song’s function on six dimensions (e.g., used to soothe a baby). Excerpts were drawn from a geographically-stratified pseudorandom sample of dance songs, lullabies, healing songs, and love songs recorded in 86 mostly small-scale societies, including hunter-gatherers, pastoralists, and subsistence farmers. Experiment 1 and its analysis plan were pre-registered. Despite participants’ unfamiliarity with the societies represented, the random sampling of each excerpt, their very short duration (14 s), and the enormous diversity of this music, the ratings demonstrated accurate and cross-culturally reliable inferences about song functions on the basis of song forms alone. In Experiment 2, internet users (N = 1000) in the United States and India rated three “contextual” features (e.g., gender of singer) and seven “musical” features (e.g., melodic complexity) of each excerpt. The songs’ contextual features were predictive of Experiment 1 function ratings, but musical features and the songs’ actual functions explained more variability in function ratings. These findings are consistent with the existence of universal links between form and function in vocal music.

eTOC

Mehr et al. show form-function associations in vocal music detectable by listeners worldwide. People in 60 countries heard songs from 86 societies. They inferred the functions of dance, lullaby, and healing from song forms alone. Ratings were near-identical across listener cohorts and were guided by the contextual and musical features of the songs.

Introduction

Research from across the biological sciences demonstrates that the features of auditory signals and other communicative behaviors are shaped by their intended outcomes [1–3]. For instance, as a general principle, low-frequency, harsh vocal forms with nonlinearities are expected to function in signaling hostility, because those features are correlated with increases in body size and larger animals tend to defeat smaller animals in conflicts [1,4]. This form-function relation is found in many vertebrates, e.g., in the cricket frog [5], river bullhead [6], sparrow hawk [7], and red deer [8] and it is salient enough that naïve listeners identify arousal levels from vocalizations in mammals, amphibians, and reptiles [9].

Similar form-function relations are present in the hostile vocalizations of humans [10,11] and in other domains of human vocal communication. Across 24 societies, the sounds of co-laughter between friends and strangers are distinguishable by acoustic features of the voice associated with arousal [12]; relationships exist between sound and meaning in the word-forms of thousands of human languages [13]; and intention categories in both infant- and adult-directed speech are identifiable from their vocal forms alone [14].

Music has been predicted to show form-function relationships in the contexts of dance [15,16], infant care [17], and ceremonial healing [18]: music used for each of these social functions is expected to show regularities in its form across cultures. In the field of music theory, “form” typically refers to the organization of composed music (e.g., the exposition, development, and recapitulation of “sonata form”). This is not what we mean by “form”. Here and throughout, we use “form” in a similar fashion to prior work concerning form and function in vocalization; that is, to refer to the acoustical properties of the vocalization. In vocal music, such forms include “contextual” features (e.g., gender of singer) and “musical” features (e.g., melodic complexity).

In the domain of emotion, listeners can accurately detect extra-musical information from music played in isolation. For instance, Canadians accurately detect intended emotions of joy, sadness, or anger in Hindustani ragas, despite being unfamiliar with the genre [19]. Similar effects are found with other music and with listeners from other societies [20,21], including in one non-industrialized society, the Mafa of Cameroon [22] (for review, see [23]). Emotion recognition in music could help to inform form-function inferences about music, but it is unknown whether such inferences exist and, if they do, whether they extend across the music of all cultures.

Studies of a collection of lullabies and love songs [24,25] provide some evidence for regularities in infant-directed songs across cultures. However, the songs therein were selected in part on the basis of their acoustic features, were only sampled from two categories of a much wider musical repertoire, and were not sampled systematically across cultures, which undermines any general inferences of universality in the forms of infant-directed songs. The last issue is common among cross-cultural studies of music, which tend either to study a small number of cultures or use otherwise unrepresentative samples. For instance, a study examining cross-cultural regularities in music [26] used the Garland Encyclopedia of Music, which samples irregularly across geographic regions, ethnolinguistic histories, and, crucially, the many social contexts in which music is found. Infant-directed songs constitute less than 5% of the music studied, despite infant-directed music being a common and likely universal form of musical expression [17]. Uneven sampling has the clear potential to bias general inferences from cross-cultural datasets. In the case of [26], the under-sampling of infant-directed songs skews any estimate of gender bias in music away from female singers.

While researchers have proposed a number of potential universals in music and musical behavior [27–29], many of which pertain directly to the possibility of links between form and function in music, testing them requires representative samples of music that span geographic, linguistic, and cultural dimensions, along with the many social contexts in which music appears. Here, we present experiments that do so. We report the results of two experiments using the newly-created Natural History of Song Discography. We test for the existence of form-function links in the vocal music of 86 human cultures (Experiment 1) and investigate the forms of this music to explore the mechanisms by which listeners may infer form from function (Experiment 2).

Results

Views from the academy

Historically, the idea that there might be universals in music from many cultures has been met with considerable skepticism, especially among music scholars. This is unsurprising given the leeriness of human universals common across academic disciplines (see [30] for discussion), but the shaky state of evidence for universals in music and the inferential issues described above may in fact justify this skepticism.

Because intellectual trends on controversial topics can change rapidly, we quantified current views on the issue by surveying 940 academics at all career stages who self-reported affiliations in ethnomusicology (n = 206), music theory (n = 148), other areas of music scholarship (n = 299), and psychological and cognitive sciences (n = 302; in total, 15 scholars indicated multiple affiliations). The sample included respondents born in 56 different countries. We asked respondents to consider an imaginary experiment wherein people listened to examples of vocal music from all cultures to ever exist and to predict two outcomes: (1) whether or not people would accurately identify the social function of each piece of music on the basis of its form alone, and (2) whether peoples’ ratings would be consistent with one another (the full text of the questions are in STAR Methods and the dataset openly available at https://osf.io/xpbq2).

The responses differed strikingly across academic fields. Among academics who self-identified as cognitive scientists, 72.9% predicted that listeners would make accurate form-function inferences and 73.2% predicted that those inferences would be mutually consistent. In contrast, only 28.8% of ethnomusicologists predicted accurate form-function inferences and 27.8% predicted mutually consistent ratings. Music theorists were more equivocal (50.7% and 52.0%), as were academics in other music disciplines (e.g., composition, music performance, music technology; 59.2% and 52.8%). When restricting the sample to tenure-track, tenured, and retired academics (n = 539), the results were comparable, with a gap of over 50 percentage points between cognitive scientists and ethnomusicologists on both measures. In sum, there is substantial disagreement among scholars about the possibility of a form-function link in human song.

Experiment 1

We used the Natural History of Song Discography to conduct a real version of the imaginary experiment we presented to survey respondents. This collection includes vocal music drawn pseudo-randomly from 86 predominantly small-scale societies, including hunter-gatherers, pastoralists, and subsistence farmers and spanning all 30 world regions defined by the Probability Sample Files of the Human Relations Area Files [31,32] (see Figure 1A and Table 1). Over 75 languages are represented. The discography was assembled by sampling four recordings from each region, with each recording representing a specific social function: dance, healing, love, or lullaby (see Figure 1A for details on the selection criteria). These four functions were chosen because they are known to be found across many cultures [26–29,33,34] and are relevant to the biological and cultural evolution of music [15,17,18,35]. Recordings were selected on the basis of ethnographic information alone: the only auditory criterion for inclusion was that the recording included audible singing, circumventing researcher biases concerning the prototypical musical features of song forms. As such, the Natural History of Song discography is a representative sample of human music, the analyses of which can help to answer questions about universality.

The 118 recordings were made in 86 societies, the locations of which are plotted in A, and sorted by song function (see Legend). Details on the societies and recordings are in Table 1 and in Method details. B, Locations of the listeners in Experiment 1 (n = 750), plotted with geolocation data gathered from IP addresses. Each gray dot represents a single listener; darker dots represent multiple listeners in the same region. Details on listener demographics and countries represented are in STAR Methods and Figure S1. See also Figure S1.

Table 1. Listing of societies and locations from which recordings were gathered.

All data are used with permission from the Natural History of Song project and are subject to correction. When multiple song types are indicated for the same society, they correspond to multiple recordings (i.e., not multiple types for the same recording). See also Figure 1.

Society	Subsistence type	Region	Sub-region	Song type(s) used
Ainu	Primarily hunter-gatherers	Asia	East Asia	Dance, Lullaby
Aka	Hunter-gatherers	Africa	Central Africa	Dance, Lullaby
Akan	Horticulturalists	Africa	Western Africa	Healing
Alacaluf	Hunter-gatherers	South America	Southern South America	Love
Amhara	Intensive agriculturalists	Africa	Eastern Africa	Love
Anggor	Horticulturalists	Oceania	Melanesia	Healing
Aymara	Horticulturalists	South America	Central Andes	Dance
Bahia Brazilians	Intensive agriculturalists	South America	Eastern South America	Dance, Healing
Bai	Intensive agriculturalists	Asia	East Asia	Love
Blackfoot	Hunter-gatherers	North America	Plains and Plateau	Dance, Lullaby
Chachi	Horticulturalists	South America	Northwestern South America	Dance
Chewa	Horticulturalists	Africa	Southern Africa	Lullaby
Chukchee	Pastoralists	Asia	North Asia	Dance, Lullaby
Chuuk	Other subsistence combinations	Oceania	Micronesia	Dance, Love
Emberá	Horticulturalists	Middle America and the Caribbean	Central America	Dance
Ewe	Horticulturalists	Africa	Western Africa	Dance
Fulani	Pastoralists	Africa	Western Africa	Love
Fut	Horticulturalists	Africa	Western Africa	Lullaby
Ganda	Intensive agriculturalists	Africa	Eastern Africa	Healing
Garifuna	Horticulturalists	Middle America and the Caribbean	Central America	Love
Garo	Horticulturalists	Asia	South Asia	Dance
Georgia	Intensive agriculturalists	Europe	Southeastern Europe	Healing
Goajiro	Pastoralists	South America	Northwestern South America	Lullaby
Gourara	Agro-pastoralists	Africa	Northern Africa	Dance
Greeks	Intensive agriculturalists	Europe	Southeastern Europe	Dance, Lullaby
Guarani	Other subsistence combinations	South America	Eastern South America	Love, Lullaby
Haida	Hunter-gatherers	North America	Northwest Coast and California	Lullaby
Hawaiians	Intensive agriculturalists	Oceania	Polynesia	Dance, Healing, Love
Highland Scots	Other subsistence combinations	Europe	British Isles	Dance, Love, Lullaby
Hopi	Intensive agriculturalists	North America	Southwest and Basin	Dance, Lullaby
Huichol	Horticulturalists	Middle America and the Caribbean	Northern Mexico	Love
Iglulik Inuit	Hunter-gatherers	North America	Arctic and Subarctic	Lullaby
Iroquois	Horticulturalists	North America	Eastern Woodlands	Dance, Healing, Lullaby
Iwaidja	Hunter-gatherers	Oceania	Australia	Love
Javaé	Horticulturalists	South America	Amazon and Orinoco	Lullaby
Kanaks	Horticulturalists	Oceania	Melanesia	Dance, Lullaby
Kelabit	Horticulturalists	Asia	Southeast Asia	Love
Kogi	Horticulturalists	South America	Northwestern South America	Healing, Love
Korea	Intensive agriculturalists	Asia	East Asia	Healing
Kuna	Horticulturalists	Middle America and the Caribbean	Central America	Healing, Lullaby
Kurds	Pastoralists	Middle East	Middle East	Dance, Love, Lullaby
Kwakwaka’wakw	Hunter-gatherers	North America	Northwest Coast and California	Healing, Love
Lardil	Hunter-gatherers	Oceania	Australia	Lullaby
Lozi	Other subsistence combinations	Africa	Southern Africa	Dance
Lunda	Horticulturalists	Africa	Southern Africa	Healing
Maasai	Pastoralists	Africa	Eastern Africa	Dance
Marathi	Intensive agriculturalists	Asia	South Asia	Lullaby
Mataco	Primarily hunter-gatherers	South America	Southern South America	Dance, Healing
Maya (Yucatan Peninsula)	Horticulturalists	Middle America and the Caribbean	Maya Area	Healing
Mbuti	Hunter-gatherers	Africa	Central Africa	Healing
Melpa	Horticulturalists	Oceania	Melanesia	Love
Mentawaians	Horticulturalists	Asia	Southeast Asia	Dance
Meratus	Horticulturalists	Asia	Southeast Asia	Healing
Mi’kmaq	Hunter-gatherers	North America	Eastern Woodlands	Love
Nahua	Other subsistence combinations	Middle America and the Caribbean	Maya Area	Love, Lullaby
Nanai	Primarily hunter-gatherers	Asia	North Asia	Healing
Navajo	Intensive agriculturalists	North America	Southwest and Basin	Love
Nenets	Pastoralists	Asia	North Asia	Love
Nyangatom	Pastoralists	Africa	Eastern Africa	Lullaby
Ojibwa	Hunter-gatherers	North America	Arctic and Subarctic	Dance, Healing, Love
Ona	Hunter-gatherers	South America	Southern South America	Lullaby
Otavalo Quichua	Horticulturalists	South America	Central Andes	Healing
Pawnee	Primarily hunter-gatherers	North America	Plains and Plateau	Healing, Love
Phunoi	Horticulturalists	Asia	Southeast Asia	Lullaby
Q’ero Quichua	Agro-pastoralists	South America	Central Andes	Love, Lullaby
Quechan	Intensive agriculturalists	North America	Southwest and Basin	Healing
Rwandans	Intensive agriculturalists	Africa	Central Africa	Love
Saami	Pastoralists	Europe	Scandinavia	Love, Lullaby
Samoans	Horticulturalists	Oceania	Polynesia	Lullaby
Saramaka	Other subsistence combinations	South America	Amazon and Orinoco	Dance, Love
Serbs	Intensive agriculturalists	Europe	Southeastern Europe	Love
Seri	Hunter-gatherers	Middle America and the Caribbean	Northern Mexico	Healing, Lullaby
Sweden	Intensive agriculturalists	Europe	Scandinavia	Dance
Thakali	Agro-pastoralists	Asia	South Asia	Love
Tlingit	Hunter-gatherers	North America	Northwest Coast and California	Dance
Tuareg	Agro-pastoralists	Africa	Northern Africa	Love, Lullaby
Tunisians	Intensive agriculturalists	Africa	Northern Africa	Healing
Turkmen	Intensive agriculturalists	Middle East	Middle East	Healing
Tzeltal	Horticulturalists	Middle America and the Caribbean	Maya Area	Dance
Uttar Pradesh	Intensive agriculturalists	Asia	South Asia	Healing
Walbiri	Hunter-gatherers	Oceania	Australia	Healing
Yapese	Horticulturalists	Oceania	Micronesia	Healing, Lullaby
Yaqui	Intensive agriculturalists	Middle America and the Caribbean	Northern Mexico	Dance
Ye’kuana	Horticulturalists	South America	Amazon and Orinoco	Healing
Yolngu	Hunter-gatherers	Oceania	Australia	Dance
Zulu	Horticulturalists	Africa	Southern Africa	Love

Open in a new tab

If music exhibits universal form-function associations, then (1) listeners who are unfamiliar with a given culture’s music should nonetheless accurately identify the functions of songs from that culture based on their forms alone; and (2) listeners should demonstrate comparable form-function inferences regardless of their cultural background. We pre-registered the form-function hypothesis (see https://osf.io/xpbq2) and tested it here, in Experiment 1. We presented the 118 song excerpts to 750 internet users in 60 countries (see Figure 1B and Figure S1). To ensure that listeners could hear the songs, we required them to pass a headphone screening task [36]; we also included a variety of manipulation checks designed to remove inattentive participants (see STAR Methods). Participants listened to a random sample of 36 song excerpts, yielding an average of 225 independent listens (SD = 13.9, range: 175–254) for each of the 118 songs (26,580 in total). The broad range of cultures and languages represented in the Natural History of Song Discography, combined with the broad range of countries of origin of the participants, makes it likely that participants were both unfamiliar with the music they heard and unable to understand the lyrics.

After each excerpt, participants answered 6 questions indicating their perceptions of the function of each song: on 6-point scales, the degree to which they believed that each song was used (1) for dancing; (2) to soothe a baby; (3) to heal illness; (4) to express love for another person; (5) to mourn the dead; and (6) to tell a story. In total, participants provided 159,480 ratings (26,580 total listens × 6 ratings/song). The first four questions correspond to actual functions of the songs, while the last two do not: they were included as foils, to dissuade listeners from an assumption that only four song types were actually present, which could have influenced their responses toward the study’s hypothesis. However, because storytelling and mourning are common functions of music in small-scale societies worldwide [33,34], we also analyzed responses on these dimensions; the songs in the Natural History of Song discography are not explicitly used for storytelling or mourning, but they may nevertheless share features in reliable patterns with songs that are. A demonstration experiment is at https://harvard.az1.qualtrics.com/jfe/form/SV_e8M5XpwzWS7A0Nn and all data and song excerpts are at https://osf.io/xpbq2.

The analysis strategy had two parts. First, we tested the accuracy of listeners’ function inferences via no-constant multiple regressions of the average rating on each of the six questions, with binary predictors for each of the four song functions. We compared perceived song functions to actual song functions via post-hoc general linear hypothesis tests of two types: (1) comparisons of perceived function across known song functions (e.g., “are lullabies rated higher on ‘…to soothe a baby’ than dance songs are?”), and (2) comparisons of each song form to the base rate for a perceived function across all songs (e.g., “are lullabies rated lower on ‘…for dancing’ than the average song is?”). The latter analysis is informative in both positive and negative directions: response patterns reveal listeners’ intuitions both for whether a song form has a given function and whether it does not. For all analyses, we report results both in raw units (a song type’s average rating from “Definitely not used…” [1] to “Definitely used…” [6]) and in standardized units (z-scores). Full reporting is in Tables 2 and 3.

Table 2. Main effects.

Each section of the table reports general linear hypothesis tests comparing the four main function ratings corresponding to the target song type to the function ratings for the other three song types (e.g., Are dance songs rated higher on the function “for dancing” than lullabies, love songs, or healing songs?). Comparisons for each item are listed in descending order of effect size. Bolded results are statistically significant at alpha = .05. See also Figure 2.

	M_diff	95% CI	F(1,114)	p	z-score
Dance songs as used “for dancing”
vs. lullabies	2.18	[1.66, 2.70]	68.5	2.74 × 10⁻¹³	1.70
vs. love songs	1.38	[0.86, 1.90]	27.6	7.11 × 10⁻⁷	1.08
vs. healing songs	1.09	[0.56, 1.62]	16.6	8.68 × 10⁻⁵	0.85
Lullabies as used “to soothe a baby”
vs. dance songs	1.53	[1.15, 1.91]	63.3	1.44 × 10⁻¹²	1.60
vs. healing songs	1.42	[1.03, 1.80]	52.4	5.59× 10⁻¹¹	1.48
vs. love songs	1.19	[0.81, 1.57]	38.0	1.08 × 10⁻⁸	1.24
Healing songs as used “to heal illness”
vs. dance songs	0.47	[0.20, 0.73]	11.8	.000826	0.87
vs. love songs	0.31	[0.04, 0.58]	5.14	.0253	0.57
vs. lullabies	0.26	[−0.01, 0.52]	3.58	.0611	0.48
Love songs as used “to express love to another person”
vs. healing songs	0.46	[0.19, 0.74]	11.0	.00122	0.83
vs. dance songs	0.14	[−0.13, 0.41]	1.00	.319	0.25
vs. lullabies	0.03	[−0.24, 0.30]	0.04	.839	0.05

Open in a new tab

Table 3. Exploratory effects.

Each section of the table reports general linear hypothesis tests of ratings on the two foil dimensions for two target song types. Comparisons are between a target song type (e.g., healing songs for “to mourn the dead”) and the other three song types and are listed in descending order of effect size. Bolded results are statistically significant at alpha = .05. The statistics in this table correspond to the visualizations in the left-hand panels of Figure 3.

	M_diff	95% CI	F(1,114)	p	z-score
Healing songs as used “to mourn the dead”
vs. dance songs	0.73	[0.34, 1.13]	13.8	.000320	0.93
vs. lullabies	0.38	[−0.01, 0.77]	3.68	.0576	0.48
vs. love songs	0.29	[−0.10, 0.68]	2.11	.149	0.36
Love songs as used “to tell a story”
vs. lullabies	0.33	[0.11, 0.54]	8.79	.00368	0.74
vs. healing songs	0.26	[0.04, 0.49]	5.57	.0199	0.60
vs. dance songs	0.19	[−0.03, 0.41]	2.91	.0910	0.43

Open in a new tab

Second, to investigate the uniformity of form-function inferences across participants, we split our sample into three cohorts (N = 250 each: United States, India, and a “World” cohort of 58 other countries with relatively low Human Development Index scores; see STAR Methods and Figure S1) and examined the degree of cohort-wise agreement for each function rating. For each question, we ran three multiple regressions, each predicting one cohort’s average ratings for each song from those of the other two cohorts; we report the best-fitting regression. Listeners’ perceptions of song functions were in reliable agreement with the songs’ actual functions. When listening to dance songs, participants rated them as used “for dancing” higher than they did for any other song type (Figure 2A), with the mean difference (M_diff) in raw scores ranging from 1.09 to 2.18 (on a 6-point scale). These effects correspond to z-scores of 0.85 to 1.70 (Table 2). Dance songs were also rated substantially higher than the base rate of “used for dancing” across all songs (M_diff = 1.16, 95% CI = [0.79, 1.53], F(1,114) = 39.1, p = 7.23 × 10⁻⁹, z-score = 0.91), while lullabies were rated substantially lower than the base rate (M_diff = −1.01, 95% CI = [−1.38, −0.65], F(1,114) = 29.7, p = 2.98 × 10⁻⁷, z-score = −0.80). Moreover, these ratings were reliable across listeners: listeners’ ratings of “…for dancing” were tightly related to one another between the USA, India, and World cohorts (Figure 2B; F(2,115) = 1877.5, p = 4.67 × 10⁻⁹⁰, R² = .970). Listeners thus intuited that dance songs are the most “for dancing” of all song forms, whereas lullabies are not for dancing. Moreover, despite their near-complete unfamiliarity with the music they heard, listeners at opposite ends of the world shared intuitions for the musical forms of dance songs.

Participants, who were unaware of the functions of songs from which excerpts were drawn, were asked to judge the function of each excerpt on each dimension on a scale from 1 (“Definitely not for…”) to 6 (“Definitely for…”). Results are grouped by question, one per box, with the text of each question at the top of each box. The left side of each box presents listeners’ *perceived* function of each song plotted as a function of the songs’ *actual* functions, in violin plots. The right side of each box presents the degree of agreement in ratings across the three cohorts of listeners. In all plots, each point represents a song’s average rating. In the violin plots (left side), song-wise averages are reported both as raw ratings (left y-axis) and as z-scores (right y-axis); the latter included for reference to effect sizes relative to a Normal distribution. The violin plots are kernel density estimations, the black lines are means, and the shaded white areas are the 95% confidence intervals of the means. Dotted lines denote the grand mean on each question, which varies in units of raw ratings, but due to normalization, is always 0 in z-scores. In the 3D scatterplots (right side), the dotted line is the equation z = y = x; that is, perfect consistency across cohorts. Please visit https://osf.io/xpbq2 to explore the 3D plots directly; these online versions can be rotated and zoomed interactively. A, Listeners rate dance songs higher on “for dancing” than all other song types and higher than the average song. B, Across all songs, ratings on “for dancing” are highly consistent across listeners in the USA, India and 58 other countries in the “World” cohort, and dance songs (in blue) form a strikingly distinct group from other songs. C and D, Comparable patterns are found with lullabies: listeners rate them higher than any other song type as used “to soothe a baby”, higher than average, and consistently across cohorts. E, Similarly, function inferences for healing songs are distinct from other song types, but with smaller effect sizes, and F, listener ratings are consistent across cohorts but with more variation than in B and D. In G, listeners do not accurately rate love songs as used “to express love for another person”; but nevertheless, H, ratings on this dimension are consistent across cohorts. Asterisks denote p-values from general linear hypothesis tests (left panels) or multiple regression omnibus tests (right panels). ***p < .001, **p < .01, *p < .05, ^tp < .1. See also Tables S4–S7.

These effects are large. The raw difference in ratings between lullabies and dance songs (M_diff = 2.18) covers more than one third of the entire scale available. The same comparison in units of standard deviation (z-score = 1.70) is roughly the size of the average difference in height between men and women worldwide [37] and over three times the size of typical effects in psychology [38].

In results of similar sizes and patterns, listeners rated lullabies as used “to soothe a baby” higher than any other song type (Figure 2C and Table 2). Their ratings were far higher than the base rate across all songs (M_diff = 1.03, 95% CI = [0.76, 1.30], F(1,114) = 57.0, p = 1.16 × 10⁻¹¹, z-score = 1.07). Further, dance and healing excerpts were rated lower than the base rate, indicating that listeners felt dance and healing songs are not for soothing babies (dance songs: M_diff = −0.50, 95% CI = [−0.77, −0.23], F(1,114) = 13.7, p = .0003, z-score = −0.52; healing songs: M_diff = −0.39, 95% CI = [−0.67, −0.11], F(1,114) = 7.69, p = .006, z-score = −0.41). As with dance songs, listeners’ ratings of “…to soothe a baby” were nearly identical across cohorts (Figure 2D; F(2,115) = 2188.2, p = 7.70 × 10⁻⁹⁴, R² = .974). Thus, lullabies found worldwide share enough features to elicit large and distinctive profiles of function ratings from naïve listeners. These results confirm predictions from a theoretical account of infant-directed music [17].

Inferences about healing songs showed similar patterns, though listeners were less confident in their appraisals, as indicated by smaller effect sizes (Figure 2E). They rated healing songs significantly above the base rate of the dimension “to heal illness” (M_diff = 0.26, 95% CI = [0.07, 0.45], F(1,114) = 7.21, p = .008, z-score = 0.49) and significantly higher than dance songs and love songs, with a nonsignificant difference from lullabies (Table 2). Only dance songs were rated significantly below the base rate (M_diff = −0.20, 95% CI = [−0.39, −0.02], F(1,114) = 4.69, p = .032, z-score = −0.38). Listeners around the world shared notions of which songs were used “to heal illness”, although cohort-wise agreement was lower than for dance songs or lullabies (Figure 2F; F(2,115) = 352.3, p = 1.27 × 10⁻⁵⁰, R² = .860).

Further, listener ratings exhibited a modest relation between healing songs and the foil dimension “to mourn the dead” (Figure 3A), with healing songs rated significantly higher than the base rate (M_diff = 0.36, 95% CI = [0.07, 0.64], F(1,114) = 6.27, p = .014, z-score = 0.46). Healing songs were also rated higher than dance songs and marginally higher than lullabies and love songs (Table 3). Dance songs were rated significantly lower than the base rate (M_diff = −0.38, 95% CI = [−0.65, −0.11], F(1,114) = 7.57, p = .007, z-score = 0.48). The ratings also exhibited high cohort-wise agreement, though lower than the non-foil dimensions (Figure 3B; F(2,115) = 620.4, p = 2.08 × 10⁻⁶³, R² = .915). Thus, not only are cross-cultural regularities in the forms of healing song detectable by listeners from industrialized societies, but these listeners share conceptualizations of what constitutes a healing song, despite the fact that healing songs are rare in many developed nations [18].

To mask the number of known song functions presented in the study, participants also rated the songs on two dimensions that were not explicitly represented by the songs in corpus. Thus, we had no predictions for responses on these dimensions. However, listener responses demonstrated modest, but consistent differences across song types. A, Healing songs are rated higher than average on “to mourn the dead”, and higher than both dance songs and lullabies, with B, consistent responses across cohorts. C, Despite the fact that listeners were unable to detect love songs on the dimension “to express love to another person” (see Figure 2), they *did* rate love songs higher than average on “to tell a story”, higher than healing songs and lullabies, and D, ratings across cohorts were highly consistent with one another. To explore the 3D plots directly, including rotation and zoom, please visit https://osf.io/xpbq2. Asterisks denote p-values from general linear hypothesis tests (left panels) or multiple regression omnibus tests (right panels). p < .001, **p < .01, *p < .05, ^tp < .1.

Listeners’ form-function inferences about love songs were the weakest of the four song types (Figure 2G). In contrast to the other three song types, love songs were not rated significantly higher than the base rate (M_diff = 0.15, 95% CI = [−.04, 0.35], F(1,114) = 2.45, p = .120, z-score = 0.27) and only healing songs were rated significantly below it (M_diff = −0.31, 95% CI = [−0.51, −0.11], F(1,114) = 9.60, p = .002, z-score = −0.56). Listeners rated love songs as used “to express love to another person” higher than healing songs only (M_diff = 0.46, 95% CI = [0.19, 0.74], F(1,114) = 11.0, p = .001, z-score = 0.83), but not the other two song types (Table 2). Listeners did, however, make reliable assessments in their ratings of love songs across cohorts (Figure 2H; F(2,115) = 283.6, p = 5.85 × 10⁻⁴⁶, R² = .831). They also judged love songs to be higher-than-average on the foil dimension “to tell a story” (Figure 3C; M_diff = 0.19, 95% CI = [0.04, 0.35], F(1,114) = 6.18, p = .014, z-score = 0.43), higher than both healing songs and lullabies, but not dance songs (Table 3). Ratings for “to tell a story” were highly similar across study populations (Figure 3D; F(2,115) = 235.2, p = 4.52 × 10⁻⁴², R² = .804). Listeners thus do make some form-function inferences about love songs, but they are not nearly as clear as those of the other song types we studied.

To investigate the variability of these findings across the geographic regions from which songs were recorded, we took advantage of the geographic stratification used in the construction of the Natural History of Song discography. Songs in the discography were gathered by obtaining one example of each of the four song types across 30 geographic regions (see STAR Methods), which enables a simple test of the geographic variability of the form-function inferences described above. For each of the three high-accuracy form-function inferences (i.e., dance songs used “for dancing”, lullabies used “to soothe a baby”, and healing songs used “to heal illness”) we took the region-wise average function rating across each region and counted the number of regions in which the target song type had a higher-than-average function rating.

The results show near-uniformity of form-function inferences for dance songs and lullabies across the geographic regions from which songs were sampled, with weaker results for healing songs. In 27 of 30 world regions (90.0%), dance songs were rated higher as “for dancing” than the other three song types; in 29 of 30 regions (96.7%), lullabies were rated higher as “to soothe a baby” than the other three song types; and in 20 of 28 regions (71.4%; n.b., the Natural History of Song discography lacks healing songs from two regions) were healing songs rated higher as “to heal illness” than the other three song types. Thus, not only are listeners’ form-function inferences accurate and reliable, but they show a strong degree of uniformity across the cultures studied (especially for dance songs and lullabies).

In sum, three common types of songs found worldwide — dance songs, lullabies, and healing songs — appear to elicit accurate and reliable form-function inferences from a diverse body of listeners. These findings are consistent with the existence of universal form-function links in human song.

Experiment 2

What features of song forms enable naïve listeners to accurately and reliably identify song functions? In Experiment 2, we conducted an exploratory investigation of the features listeners used to discriminate song functions, focusing on general traits of the recordings that are detectable by naïve listeners. We presented the same 118 excerpts from Experiment 1 to 1000 internet users in India (n = 500) and the United States (n = 500). No listeners participated in both experiments. As in Experiment 1, we required listeners to pass a headphone screening task and filtered out inattentive participants with a series of manipulation checks (see STAR Methods). Each participant listened to 18 song excerpts, yielding an average of 149 independent listens (SD = 11.3, range: 123–176) per song (17,527 in total).

For each excerpt, participants answered a random set of 5 questions drawn from a set of 10. Three corresponded with participants’ ratings of contextual aspects of the performance: (1) number of singers; (2) gender of singer(s); and (3) number of instruments. Seven corresponded with subjective musical features of the song: (1) melodic complexity; (2) rhythmic complexity; (3) tempo; (4) steady beat; (5) arousal; (6) valence; and (7) pleasantness. Listeners provided a total of 87,142 ratings (17,527 total listens × 5 ratings/song – 493 listener/song/feature combinations where no answer was provided) and split-half reliability of the items was acceptable (rs = .81–.99; see STAR Methods for more information along with the full text of the 10 items).

To assess whether and how the contextual and musical features of song forms predicted listeners’ ratings of social function, we conducted 3 sets of exploratory analyses. First, we examined the degree of variation on each of the 10 features across each of the song forms and tested whether or not song forms differed on those features. Second, we summarized the musical features via a principal components analysis. Third, we examined the influence of the songs’ contextual features and musical features on listeners’ function ratings with a series of regressions. Given the high degree of subjectivity of the ratings, the very brief excerpts, and the complete lack of context provided to the listeners, we consider these analyses to be exploratory and not exhaustive: they are intended to help explain the findings of Experiment 1, not to provide a comprehensive feature analysis of Natural History of Song recordings.

The four song types showed clear differences in both contextual and musical features (Figure S2). Unsurprisingly, the forms of dance songs and lullabies differed most from other song types, both for contextual and musical features (full reporting is in Table S1). Relative to the other three song types, listeners rated dance songs as having more singers (z-score = 0.86), more instruments (z-score = 0.76), higher melodic complexity (z-score = 0.79), higher rhythmic complexity (z-score = 0.87), faster tempo (z-score = 1.09), steadier beat (z-score = 0.84), higher arousal (z-score = 1.17), higher valence (z-score = 1.09), and higher pleasantness (z-score = 0.72). Effects for lullabies were comparably large, but in the opposite direction: relative to the other song types, lullabies were rated as having fewer singers (z-score = −0.76), fewer instruments (z-score = −0.92), lower melodic complexity (z-score = −1.12), lower rhythmic complexity (z-score = −1.06), slower tempo (z-score = −1.04), less steady beat (z-score = −0.63), lower arousal (z-score = −0.90), lower valence (z-score = −0.74), and lower pleasantness (z-score = −0.45). Lullabies were also rated substantially more likely than the other song types to have a female singer (z-score = 0.93). As in Experiment 1, results with healing songs and love songs were mostly inconclusive (see Table S1). In sum, listeners heard substantial differences between the forms of lullabies and dance songs, with modest results for healing and love songs.

Because the 7 musical ratings were highly correlated with one another (Table S2) we conducted a principal components analysis to summarize them. This yielded two components with eigenvalues greater than 1, explaining 88.1% of item variance. We report unrotated components here and throughout. Component 1 correlated moderately and positively with all 7 features, while Component 2 correlated negatively with melodic and rhythmic complexity, positively with pleasantness and steady beat, and did not correlate with valence or arousal (full reporting is in Table S3).

Because listeners in Experiment 1 did not provide mutually exclusive ratings for song function, as they did in previous work (e.g., [24], where listeners rated songs as either “lullaby” or “love song”), listener “errors” in ratings can be captured here on continuous scales. To explore cases where different song types were highly rated on the same function (e.g., a healing song and a dance song both rated highly — and erroneously — as “to soothe a baby”), we plotted each song’s function rating against its location in principal components space. This analysis, visualized in Figure 4, demonstrates the relation between the strength of each song’s function rating (from Experiment 1) and a two-dimensional summary of each song’s form (from Experiment 2).

In the scatterplots (**A–D**), each point shows the location of a song in principal-components space, along with the strength of its form-function inference (i.e., in A, the larger the point, the higher the song’s rating on “for dancing”). Bubble sizes are unstandardized across plots. As in the previous figures, dance songs are depicted in blue, healing songs in red, love songs in yellow, and lullabies in green. See also Figure S2 and Tables S1–S7.

There were two main results. First, songs of different types overlapped substantially in principal components space. Second, incorrect ratings occur non-randomly: songs rated erroneously high on a given function tend to share similar forms with songs that do have that function. This pattern is evident for all song types, including those with accurate, reliable form-function inferences: while lullabies and dance songs were clearly distinguished from one another in Experiment 1, in principal components space, some lullabies appear alongside dance songs and are rated correspondingly high on the dimension “for dancing”. The converse is also true.

Last, we examined the extent to which the feature ratings in Experiment 2 explained the form-function inferences in Experiment 1. If function inferences are determined by contextual features alone, the findings of Experiment 1 may simply reflect broad patterns in how music is used across cultures — e.g., “lullabies usually have only one singer, who is usually female” — rather than supporting the hypothesis that song forms themselves inform listeners’ function inferences. To test this question, we built four series of regression models (one series per function rating). Within each series, we examined the degree to which their variance was explained by the contextual feature ratings alone (Model 1), the principal-components reduction of musical feature ratings alone (Model 2), both sets of features (Model 3), and both sets with an indicator variable for the target song type (Model 4; full reporting is in Tables S4–S7).

Relative to models predicting perceived song function from contextual features alone, the inclusion of the two principal components and the target song form as covariates substantially increased model fit. A model with only the contextual features predicted 74.6% of variance in the function rating “for dancing” (Table S4; F(3,114) = 112, p = 8.07 × 10⁻³⁴), whereas the inclusion of the principal components and an indicator variable for dance songs increased explanatory power by 14.8 percentage points (R² = .895; nested test: F(3,111) = 52.0, p = 4.59 × 10⁻²¹). Even with these covariates, the indicator for dance songs explained unique variance (partial R² = .0846, p = .002). For lullabies (Table S5), a model with contextual features, principal components, and an indicator variable for lullabies explained 9.7 percentage points more variance in the function rating “to soothe a baby” (R² = .683) than did a model with only contextual features (R² = .586), a significant difference (nested test: F(3,111) = 11.3, p = 1.55 × 10⁻⁶). As with dance songs, the indicator for lullabies explained unique variance (partial R² = .094, p = .0009). Similar results were present in healing songs (Table S6) and love songs (Table S7).

In sum, the form-function inferences that listeners made in Experiment 1 cannot be explained solely by contextual features of music. While they are indeed predictive of listeners’ function ratings, subjectively-rated musical features of each song tended to explain more unique variability in function ratings than did contextual ratings. Moreover, neither contextual nor musical features fully explained function ratings: an identifier covariate in models for all four song types explained unique variance in function ratings. Function detection in song is thus facilitated by both contextual and musical features of song forms — and by other features reliably present in songs that were not measured in Experiment 2.

Discussion

The present research provides evidence for the existence of recurrent, perceptible features of three domains of vocal music across 86 human societies, and the striking consistency of form-function percepts across listeners from around the globe — listeners who presumably know little or nothing about the music of indigenous peoples. Moreover, these studies suggest that song types differ from each other on the basis of both contextual and musical features, but musical features tend to be more predictive of form-function inferences than contextual features.

Why do songs that share social functions have convergent forms? If dance songs are shaped by adaptations for signaling coalition quality [15], their contextual and musical features should amplify that signal. The feature ratings in Experiment 2 support this idea: dance songs tend to have more singers, more instruments, more complex melodies, and more complex rhythms than other forms of music. If lullabies are shaped by adaptations for signaling parental attention to infants [17], their acoustic features should amplify that signal. The feature ratings in Experiment 2 support this idea: lullabies tend to be rhythmically and melodically simpler, slower, sung by one female person, and with low arousal relative to other forms of music.

This work raises two key questions about the basic facts of music. First, despite the geographic variation in listeners in Experiment 1, all participants were English-literate and had access to an expansive variety of music on the Internet. They thus share a great deal of musical experience. Do form-function inferences generalize to all listeners worldwide, even those who have no shared musical experience, or who know only the music of their own culture? A stronger test of universality would require testing the inferences of people living in isolated societies with minimal access to the music of other cultures.

Second, while we used naïve listeners’ perceptions of musical forms to explore what drove form-function inferences, those perceptions are subjective, were based on brief excerpts of the songs rather than full performances, and lack rich contextual information available from ethnomusicologists and anthropologists. Are the musical and contextual features of the songs that inform function inferences universal? A stronger demonstration of universals in music would require in-depth feature analyses of a cross-culturally representative sample of music from small-scale societies, informed by expert listeners, music information retrieval, and modern approaches from data science.

Nevertheless, the present research demonstrates that cross-cultural regularities in human behavior pattern music into recurrent, recognizable forms, while maintaining its profound and beautiful variability across cultures.

STAR Methods

Contact for resource sharing

Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Samuel Mehr (sam@wjh.harvard.edu).

Experimental model and subject details

Survey of academics

940 academics (390 female, 439 male, 3 other, 108 did not disclose; age 20–91 years, mean = 46.7, SD = 14.5) born in 56 countries were recruited in two fashions: first, by emailing all affiliates publicly listed in the Music and Psychology/Cognitive Science departments at the top 200 universities listed for each department in the U.S. News & World Report Best Colleges; and second, by distributing the survey anonymously to three music listservs (Society for Ethnomusicology, Society for Music Theory, and AUDITORY). No participants were excluded from analyses. Participants were given the opportunity to enter into a drawing for 50 gift cards of $25 value and could opt out of any/all questions on the survey. All participants agreed to a consent statement before the study, which was approved by Harvard University’s Committee on the Use of Human Subjects. All procedures were in accordance with approved guidelines.