Abstract
Constructed languages, frequently invented to support world-building in fantasy and science fiction genres, are often intended to sound similar to the characteristics of the people who speak them. The aims of this study are (1) to investigate whether some fictional languages, such as Orkish whose speakers are portrayed as villainous, are rated more negatively by listeners than, for example, the Elvish languages, even when they are all produced without emotional involvement in the voice; and (2) to investigate whether the rating results can be related to the sound structure of the languages under investigation. An online rating experiment with three 7-point semantic differential scales was conducted, in which three sentences from each of 12 fictional languages (Neo-Orkish, Quenya, Sindarin, Khuzdul, Adûnaic, Klingon, Vulcan, Atlantean, Dothraki, Na’vi, Kesh, ʕuiʕuid) were rated, spoken by a female and a male speaker. The results from 129 participants indicate that Klingon and Dothraki do indeed sound more unpleasant, evil, and aggressive than the Elvish languages Sindarin and Quenya. Furthermore, this difference in rating is predicted by certain characteristics of the sound structure, such as the percentage of non-German sounds and the percentage of voicing. The implications of these results are discussed in relation to theories of language attitude.
Keywords: Language attitude, phonaesthetics, sound symbolism, constructed languages
1 Introduction
World building in popular fantasy culture often involves the invention of languages that are intended to reflect the attitudes and characteristics of the “peoples” who speak these so-called constructed languages (conlangs for short). For example, Black Speech or Orkish, the language of the Orcs in Tolkien’s realm of Middle-earth, is intended to resemble in sound the appearance and character traits of the creatures who speak it (see, for example, Gymnich, 2005; Podhorodecka, 2007; Smith, 2007). In Tolkien’s own words, it sounds “at all time full of hate and anger” (Tolkien, 2021, p. 445). Another intention of many conlang inventors in popular fantasy culture is to create a language that sounds distinctly different from natural languages. That is to say, a conlang shall convey an otherness or strangeness usually relative to the major Western languages. From a linguistic perspective, the question arises whether phonetic and phonological properties alone can indeed evoke such intended impressions, or whether such impressions do not depend on the properties of these languages, but rather on the appearance of their speakers (e.g., Elves vs. Orcs), the disposition of the speakers, their portrayal, or special sound effects.
Within the field of language attitude research, the view that judgments about natural languages are based on their inherent linguistic properties, such as the sound inventory or the syllable structure, has been subsumed under the “inherent value hypothesis” by Giles et al. (1979). In natural languages, however, listeners judge the pleasantness of, for instance, dialect varieties on the basis of imposed social prestige and listeners’ attitudes toward the speakers of that variety (see, for example, Giles & Niedzielski, 1998). This so-called “imposed-norm hypothesis” requires some knowledge of what was said and the social traits associated with the speakers. The dialect variety must also be known to the listener. However, conlangs are unknown to most people and the meaning of an utterance in a conlang is therefore mostly impenetrable without a translation as is everything else related to their grammar (Beinhoff, 2015). Thus, for example, testing the pleasantness of conlangs without providing any knowledge about them minimizes the possible effect of imposed norms. Accordingly, we hypothesize that the aesthetic evaluations of conlangs can be attributed solely to the sound of the conlangs, as this is the only remaining source accessible to a listener for evaluation. In addition, some conlangs are specifically designed to evoke attitudes through the use of phonaesthetic features. For these reasons, conlangs are an ideal test ground for testing Giles et al.’s (1979) “inherent value hypothesis”.
This study has two aims: first, we test whether listeners rate fictional languages according to the impressions intended for them by their designers on three scales (pleasant–unpleasant, good–evil, peaceful–aggressive), even when the stimuli are produced in a voice without special sound effects and emotional involvement. Second, we compare the rating results with phonetic and phonological characteristics of the stimuli, such as the percentage of obstruents and vowels, the sonority index of the stimuli, the number of non-German segments, and the percentage of voicing. Evidence of a relationship between the ratings and the phonological and phonetic properties would support the notion of “phonetic fitness” (see Tolkien, 2016, p. 67). This term, also called lámatyávë in Quenya by Tolkien (1993, p. 216), refers to the view that aesthetic and emotive aspects of words are embedded in their phonetic form (see Podhorodecka, 2007).
The following introduction provides an overview of current research on language attitudes and phonaesthetics, particularly in relation to conlangs. In addition, the fictional languages chosen for the current study are described, with information on their phonological features and a brief description of the presumed dispositions of their speakers.
1.1 Language attitude and phonaesthetics
What makes a language sound beautiful or ugly? Giles et al. (1979) discuss two possible causes: the inherent value hypothesis and the imposed norm hypothesis. The former states that the inherent properties of a language, such as its sounds, syllable structure, sonority, and prosody, influence its perception. Van Bezooijen (2002) uses the more specific term sound-driven hypothesis. According to the second, the imposed norm hypothesis, listeners’ attitudes toward the speakers of a language are supposed to shape its perception (see, for example, Reiterer et al., 2020). This implies that social norms, stereotypes, and other non-linguistic factors are crucial for the evaluation of a language or dialect (see Giles et al., 1979; Trudgill & Giles, 1978). Another important factor in aesthetic judgments of languages is speech intelligibility. In other words, dialects that are less intelligible to listeners are also judged to be less pleasant (see cf., Giles & Niedzielski, 1998; Van Bezooijen, 2002). Related to this is the familiarity-driven hypothesis, which states that the more familiar people are to a language or language variety, the more positively they will rate it. Van Bezooijen (2002) also points to a positive relationship of similarity toward the standard variety. We try to relate our results to these hypotheses.
In a recent rating study of 16 European languages, Reiterer et al. (2020) showed that aesthetic judgments are affected by a mixture of all these factors and are difficult to distinguish. In their study, 45 participants with different L1 (50% Slovenian, followed by German, American English, and others) rated these languages on 22 binary semantic scales, later condensed to 5: Beauty—Welcoming (short “Beauty”), Culture—Status (“Status”), Eroticism (“Eros”), Sweet—Soft (“Softness”), and “Orderliness.” The results show that the best ratings were given to languages with higher scores for sonority, vocalic share, and speech rhythm. The languages that scored the highest on all scales combined were French, English, Italian, Spanish, and Catalan, while Danish, Greek, Hungarian, Polish, and Welsh scored the lowest, thus confirming the inherent value hypothesis. Further evidence can be found in Kogan and Reiterer (2021). They showed that the participants tended to perceive languages as more pleasant and “melodic” when they were produced at a higher speaking rate and with less F0 variation. The negative rating found for slower and more modulated languages may be due to their syllable structure: The more complex the syllables, the slower the speech tempo and the more difficult it is to vary pitch. Similarly, the repetition of consonant–vowel (CV) syllables may evoke the impression of musical structures, leading to a more pleasant rating. Speech rate is also important for identifying foreign languages (Dufter & Reich, 2003). Previous studies have shown that speech rate and the volatility of the base frequency are also crucial for the perception of ideophones (Dingemanse et al., 2016).
Apart from these sound-related factors, Reiterer et al. (2020) also found a significant positive correlation between rated familiarity and beauty: for example, Welsh and Greek, despite their high sonority values, received lower ratings than the more familiar languages such as English and French. However, it should be noted that Catalan, with a low recognition rate, was rated very positively, whereas German, which was identified by all participants, was one of the most negatively rated languages. This could be explained by the high sonority values for Catalan and the lower values in German. However, sonority cannot be the only cause, since the sonority index of English and German is similar but English was rated much more positively. Therefore, Reiterer et al. (2020) suggest that social factors could explain the lower ranking of German. However, an earlier study by Van Bezooijen (2002) found no correlation between the rated familiarity and how beautiful a language sounds. The results of a very recent study by Anikin et al. (2023) indicate on the contrary that familiarity may have a very important impact on how pleasant a language is perceived. In another study (see Mooshammer et al., 2023), we also found indications that recognizing a conlang influences how that conlang is assessed. We therefore assume that familiarity has a strong effect on language assessment.
One possible way to separate sound-related factors from other effects, such as familiarity and intelligibility as well as social factors, is to investigate languages that are not recognized by the listeners. To our knowledge, there are only very few studies using natural languages for this purpose. One example is a study by Moreau et al. (2014), who found that English listeners were able to infer the socio-economic status of Wolof speakers without any knowledge or experience of the language. Moreau et al. (2014) do, however, not interpret this as clear evidence for the inherent value hypothesis but rather argue for the inclusion of language-universal factors in the imposed norm hypothesis. Hilton et al. (2022) investigated aesthetic judgments on languages that were unfamiliar to the raters. They found that Mandarin Chinese listeners rated Swedish as more pleasant than Danish without any knowledge of either language. When tonal information was removed from the stimuli, this effect disappeared, which, according to Hilton et al. (2022), may be evidence for a preference of Mandarin Chinese listeners for the prosody of Swedish. This may be related to Mandarin Chinese listeners being very sensitized for tonal features due to their L1. Beyond that, it lends support to the familiarity-driven hypothesis and emphasizes the role familiarity seems to play in evaluating the sound of any language—known or unknown (cf. Anikin et al., 2023; Leemann et al., 2015). Hence, disentangling the impact of familiarity and other possible factors on the evaluation of a language is complicated and will be addressed in more detail below (see Section 4).
Separating possible effects of the imposed norm hypothesis is another issue that can, however, be resolved more easily. Because this hypothesis depends strongly on the knowledge about speakers, the variety they are using, and what they are saying (social factors), employing unknown languages minimizes the assumed impact of this hypothesis (see Section 4 for a further discussion of this point). Since the present study is based on constructed languages, which have the advantage of being less easily recognized than the major European languages in Reiterer et al. (2020), social factors (e.g., English is associated with prestige and culture; Reiterer et al., 2020) as a potential confound are consequently avoided. This allows us to distinguish between the inherent value hypothesis and the imposed norm hypothesis by Giles et al. (1979): If listeners consistently rate conlangs differently from each other, for example, in terms of their pleasantness, this would support the inherent value hypothesis because listeners cannot associate cultural norms with unknown languages—at least not consistently. Thus, consistent and significantly different ratings of the conlangs can be attributed to the sound of the stimuli they heard.
But what are the phonological factors (inherent values) that influence the individual assessment of foreign languages (cf. Reiterer et al., 2020)? The question is whether phonaesthetics is the main cause for such preferences or whether other factors such as sound symbolism also play a role. Since the advent of structuralism, it has been the common view among linguists that the link between sound and meaning is arbitrary: the phonological level carries no semantic information (de Saussure, 1916). While examples of onomatopoeia such as meow or snip provide occasional exceptions, the general doctrine assumes arbitrariness. However, what is supposed to be idiosyncratic, actually has a certain degree of systematicity. In a famous experiment, Köhler (1947) tested the perception of two pseudo-words, maluma and takete, by asking the participants to associate each nonword with one of the two shapes: a round cloud-like shape and a spiky star-like shape. The majority associated the round shape with maluma and the star shape with takete. This experiment was later replicated and modified by Ramachandran and Hubbard (2001), who used the more controlled stimuli kiki and bouba and showed the same sound-symbolic effects. Interestingly, these results apply to several typologically unrelated languages (Lockwood & Dingemanse, 2015) and were shown to hold across different cultures and writing systems (see Ćwiek et al., 2022, who found the effect in 17 out of 25 tested languages). This shows that consistent iconicity is at work in languages across the world despite the mainly arbitrary relation between sound and meaning in most parts of the lexicon. It should be noted that usual prosaic words do not have the high degree of perceptional impact of true ideophones or stimuli designed for laboratory use (McLean et al., 2023). But still, the stimuli of maluma—takete and kiki—bouba stand out insofar as the voicing contrast hints to the findings by Reiterer et al. (2020) (see also Lockwood & Dingemanse, 2015).
Several studies have shown that sounds can also be associated with color, taste, speed, size, and brightness (for a recent review, see Lockwood & Dingemanse, 2015). For instance, /i/ is most often associated with the attributes precise, fast, small, and bright, in contrast to /ɑ/, which is associated with the adjectives wide, slow, large, and dark. Sound symbolic effects have also been taken into account in explaining the perception of and attitudes toward second/foreign languages (see, for example, Reiterer et al., 2020). Assuming that lightness is “positive” and darkness “negative,” as in many Western cultures, one would expect that a higher frequency of front vowels should lead to a more positive evaluation of the respective language. Conversely, the higher the frequency of back vowels, the more negative the perception of the language. Bloomfield (1909, p. 8) already saw a dichotomy between these groups of vowels: while /i/ and /e/ represent “fine, small, bright, flashing, quick, sharp, clear-cut objects or actions,” /u/ and /o/ denote “low muffled, rumbling, bubbling sounds and dull, loose, swaying, hobbling, slovenly, muddy, underhand, clumsy actions” (cf. Tolkien, 2016, p. liii). In general, the vocalic opposition often seems to lie between front and back vowels accompanied to some extent by vowel height.
This view is supported by Crystal (1995), who counted the sounds in English words that poets, readers in newspaper polls, and so on, had judged to sound pleasant. According to his results, front vowels and also the sonorants /l, m/ occur more frequently in these words than in normal everyday speech. Again, the resemblance to Köhler (1947)’s original stimulus of maluma is striking. There is also evidence that sound symbolism shapes the English lexicon. For example, Winter and Perlman (2021) find that in English the front vowels /i, ɪ/ and the stop /t/ are more common in size adjectives denoting smallness, whereas the low vowel /ɑ/ is more common in adjectives associated with largeness.
In addition to the potentially symbolic function of specific sounds, the overall composition of the words also seems to play a role in their meaning. For example, the nonsense word takete is composed of vowels and voiceless stops, and is therefore less sonorous overall than maluma, which consists of sonorants and vowels. The average sonority value also depends strongly on the syllable structure of a word. A syllable consisting of an onset consonant and a vowel is on average more sonorous than one containing complex consonant clusters and/or a coda consonant, because vowels carry more acoustic energy than consonants, especially compared with voiceless obstruents. 1 We draw on these opinions concerning which sounds are judged as more pleasant or unpleasant, to design the phonological and phonetic metrics that we investigate (see Section 2.4.).
1.2 Phonaesthetics and constructed languages
Conlangs have been the subject of several studies on phonaesthetics, most of which focus on the relationship between sound and meaning of morphemes and words as well as the sound structure, as it relates to the intended impression of the language as a whole.
Rausch (2014) and Annear (2020), for example, found a connection between vowel quality and the meaning of words denoting size, distance, color, luminosity, and temperature in several of Tolkien’s languages (Gnomish, Goldogrin, Noldorin, Sindarin, Qenya, Quenya), confirming the sound symbolic effects of the kiki–bouba experiment. Both studies provide evidence that sound symbolism is realized in these languages through the contrastive use of front and low/back vowels for the meanings of smallness and largeness, respectively. Rausch (2014) also notes that the vowel /a/ occurs in roots that reflect warmth, greatness, and brightness, and that there is a tendency for velar stops to be used in words denoting size.
To investigate what makes a language “villainous,” Stanley (2003, p. 8) examines several conlangs “which were intended by their creators to elicit negative reactions in readers or hearers.” The conlangs discussed are Black Speech/Orkish (Middle-earth), Cardassian/Kardasi (Star Trek), Klingon (Star Trek), Romulan (Star Trek), Huttese (Star Wars), Kiffish (science fiction saga “Chanur”), and Drow/Ilythiiri (Dungeons & Dragons: The Forgotten Realms). Two main phonological strategies are identified to underline the otherness of a conlang and its consequent suitability for negatively viewed protagonists: the degree of deviation from English and certain phonological properties, that is, a high number of stops (abruptness, which has negative sound-symbolic associations), voiceless fricatives (which can resemble the sounds of a serpent, suggesting threat and deception), aspiration (which is associated with the act of spitting and a forceful character), and the presence of “gutturals” (velars, uvulars, and pharyngeals, which are associated with growling, choking, hostility, and even illness). The use of unusual combinations of these sounds can easily achieve the effect of “otherness,” which has also been emphasized by conlang designers, such as Okrand for Klingon and Peterson for Dothraki (see Section 1.3 and, e.g., Peterson, 2015).
The association of sounds with specific behaviors, attitudes, and emotions is also a key point in Podhorodecka (2007). She draws on two studies by Fónagy (1991, 1999), which argue for symbolic vocal gestures that give an utterance a level of meaning other than the arbitrary, to discuss Tolkien’s concept of lámatyávë using Quenya, Sindarin, and Black Speech as examples. Podhorodecka (2007, p. 4) refers to Fónagy’s “two basic principles of encoding vocal gestures” as metonymic and metaphoric. Metonymic gestures are features of emotive speech, such as a higher proportion of consonants, especially stops, in aggressive speech due to the shortening of vowels. Metaphoric gestures are the association of the position of the tongue (i.e., forwarding or withdrawal) with proximity and distance to other persons and, in a second step, with emotions, states of mind, and so on that make people seek proximity to or distance from each other. Her results, based on three short text samples from Quenya, Sindarin, and Black Speech show that Black Speech has a higher CV ratio (1.7 consonants per vowel), and a higher proportion of back vowels (86%) and stops (48%) than Quenya and Sindarin. These two languages, in contrast, have many sonorants, that are not distorted in aggressive speech, a high proportion of front vowels (about 50% each), and a lower CV ratio (Quenya: 1.08, Sindarin: 1.22 consonants per vowel). Thus, Podhorodecka suggests that the metonymic and metaphoric gestures of Quenya and Sindarin evoke impressions of pleasantness and goodness for these languages, whereas these gestures have the opposite effect for Black Speech. Her findings are confirmed by Johannesson (2007), who also included English as a neutral control. He finds that Quenya and Sindarin have a significantly higher proportion of sonorants than Orkish. Furthermore, in Quenya the CV structure predominates, where onsets in both, open and closed syllables, are often occupied by sonorants. In contrast, Black Speech favors consonant–vowel–consonant (CVC) syllables with stops being the most common onsets in all syllable types. In terms of the parameters studied, English lies between the Elvish languages and Black Speech. These relationships with sound inventories and syllable types are investigated for the languages Quenya, Sindarin, Black Speech, Dothraki, High Valyrian, and Klingon by Elsen (2019), who finds similar associations as in the studies mentioned above.
Another interesting approach to the study of sound symbolism in conlangs is Beinhoff (2015) that provides insights into the deliberate use of sound symbolism in constructing a language. Beinhoff asked conlang creators about their intentions when creating their conlangs. Her online questionnaire was answered by a total of 55 people, providing data on 105 constructed languages. The language creators were asked to indicate their main considerations when constructing the sound system of their conlangs, which could be grouped into six categories: ease of pronunciation, aesthetics/beauty, realism/naturalism, theoretical linguistic considerations, influence of other languages and their sounds, and sound symbolism. Sound symbolism turns out to be quite familiar to the group of respondents. Creators use specific sounds in the construction of their languages, especially to express the concepts of polarity (“positive” vs. “negative”), size, gender, motion (e.g., “inwards” vs. “outwards”), brightness/lightness and shape (e.g., “length” vs. “roundness”). Hence, it seems promising to use conlangs for the investigation of sound symbolism since sound symbolic features are probably concentrated there.
Recently, Beinhoff (2023) took this approach a step further. She asked conlang creators about their intentions in creating their conlangs, and then conducted an online survey of 20 speakers of British English, European Spanish, and Austrian German each, asking them to rate various conlangs (Illitan, Celestial, Itlani, Rílin, Vaior, Ljani k’ithzeri, Na’vi, Sindarin, and Quenya) on their sound and on the characteristics pleasant, friendly, natural, strange, educated, peaceful, familiar, artificial, and aggressive. She also interviewed 10 speakers of southern British English to find out what impressions and ideas they had of the conlangs and their backgrounds, just by listening to them. She compared the responses from the interviews and the results of the online survey with what the creators of the conlangs had said about the intended sound of their languages. Her results show, inter alia, that Quenya and Sindarin are indeed perceived as particularly pleasant as intended by Tolkien.
1.3 Languages analyzed here
The languages and material considered here have been chosen on the basis of the following criteria: there had to be sufficient original material and the material available should not be readily recognizable. For example, for Entish, the language of the Ents in Tolkien’s The Lord of the Rings, only a few words are available. We also had to exclude the most famous phrases in Orkish, the inscription on Sauron’s Great Ring of Power, as it is widely known. Phrases containing proper names and buzzwords were avoided as well, such as the title Khaleesi from the Dothraki language (see below), because they might be recognizable, and thus influence the participants’ judgments based on their knowledge of the peoples who speak that language. The following section describes the fictional background and the main phonological characteristics of the languages considered here. In some cases the inventors indicated whether their language was intended to sound positive or negative and whether it was based on other natural languages (as summarized in Table 1). The translation, phonetic transcription, and glosses of all stimuli are presented in Appendix A. First, the languages invented by Tolkien and included in this study are briefly described.
Table 1.
Constructed Languages Used in the Experiment.
Conlang | World | People | “pleasing”/“harsh” | Inventor reference | Phonology similar to |
---|---|---|---|---|---|
Orkish | Middle-earth | Orks | harsh | Tolkien (2021), Appendix F | Hurrian |
Quenya | Middle-earth | Elves | pleasing | Tolkien (2021), Appendix F | See text |
Sindarin | Middle-earth | Elves | pleasing | Tolkien (2021), Appendix F | Welsh |
Khuzdul | Middle-earth | Dwarves | harsh | Tolkien (2021), Appendix F | Hebrew |
Adûnaic | Middle-earth | Humans | – | Tolkien (2002) | “Semitic” |
Klingon | Star Trek | Klingons | harsh | Okrand (1992) | – |
Vulcan | Star Trek | Vulcans | – | Okrand, Gardner and The Vulcan Language Institute (2004) | English |
Atlantean | Atlantis | Humans | – | Okrand, Shadlag (2006) | Proto-Indo-European language |
Dothraki | Game of Thrones | Humans | harsh | Peterson (2015) | See text |
Na’vi | Moon Pandora | Na’vi people | – | Andrews (2010) | – |
Kesh | Future California | Humans | – | Le Guin (2016) | See text |
ʕuiʕuid | Toub | Humans | – | Dominique Bobeck | Arabic |
Orkish or Black Speech 2 is the language of the Orcs, an evil, ugly, and primitive race, bred by a dark power (see Tolkien, 2021, Appendix F). As with all of his invented languages, Tolkien designed the sound structure of Orkish to match the appearance of the Orcs, to achieve an “inner consistency of reality” (see Tolkien, 2008, p. 59). The following phonological features have been identified as prominent in Orkish: the occurrence of gutturals, such as uvular /ʀ/, frequent and long consonant clusters, buzzing voiced fricatives, more obstruents than sonorants, a large consonant to vowel ratio, dark back vowels, and the absence of the vowel /e/ (cf. Elsen, 2019; Flieger, 2017; Podhorodecka, 2007), giving the impression of a language that sounds “menacing, powerful, harsh as stone” (Tolkien, 2021, p. 254). The sentences included in this study are, however, Neo-Orkish, constructed by the linguist David Salo for the dialogues in The Lord of the Rings and The Hobbit films because there are not many original Orkish texts that have been created by Tolkien himself, apart from the inscription on Sauron’s Great Ring of Power, which we have excluded. 3
The Elvish language Quenya is probably the most elaborate imaginary language in Tolkien’s fantasy universe, and also best reflects his personal linguistic aesthetic (Destruel, 2016; Tolkien, 2016, p. 17). In Middle-earth, Elves are the highest and purest immortal race. Spoken by these otherworldly beings, High Elvish is therefore designed to sound aesthetically peaceful and pleasing. Tolkien himself acknowledged that he designed Quenya with “a poetic or otherwise markedly formal nature” (Tolkien, 2006, p. 232). The intended poetic beauty is achieved, for example, by the use of open central and long vowels, the avoidance of complex consonant clusters, the softening of stops by omission of aspirated voiceless stops, more sonorants than obstruents, an even distribution of vowels and consonants, a tendency toward CV syllable structure and few syllables per word, voiced stops only in company with sonorants and only in word medial position, and finally the use of the alveolar /r/ (see, for example, Destruel, 2016; Johannesson, 2007; Ryan, 2014; Tolkien, 2016). Verses from Namárië, a Quenya poem Tolkien composed in 1954, are used as experimental stimuli in this study (Tolkien, 2021, p. 377f). Regarding Tolkien’s inspiration from natural languages (cf. Tikka, 2007, pp. 17–20), he states in a letter: “It might be said to be composed on a Latin basis with two other (main) ingredients that happen to give me ‘phonaesthetic’ pleasure: Finnish and Greek” (Tolkien, 1981, no. 144).
Sindarin, the second major Elvish language of Tolkien’s universe, is another language, spoken by Elves. It has a Welsh-like system of consonant mutations, which historically changed most voiceless stops into their voiced or fricative counterparts, and still does so across word boundaries (Tolkien, 1981, no. 176). Its words are rich in long vowels and diphthongs. These features make the language very sonorous. As fragments of the language are ubiquitous in The Lord of the Rings, it was easy to select representative stimulus sentences, as long as they did not contain any words that obviously betray their origin.
Far less elaborate than Tolkien’s Elvish languages is Khuzdul, the language of the Dwarves. Notable features of Khuzdul phonology include the absence of an alveolar trill or tap and the use of a uvular trill or fricative instead, as well as a three-way contrast between voiceless-unaspirated, voiceless-aspirated, and voiced-unaspirated stops, with a defective labial series lacking /p/ and /ph/. Also noteworthy are the missing voiceless counterparts of /ɣ/ and /ʁ/ (see Maddieson, 1984, 47f). The syllable structure is moderately complex, with CVCC as the maximal syllable. According to Tolkien, “Dwarvish was both complicated and cacophonous” (Tolkien, 1981, no. 25). The phrases used here are not from Tolkien’s corpus, but from Neo-Khuzdul by a fan, The Dwarrow Scholar (Dwarrow, 2017), to avoid buzzwords.
Adûnaic is Tolkien’s language for the human population of the island of Númenor. Phonologically, Adûnaic was designed to have a “faintly Semitic flavour” (Tolkien, 2002, p. 240). Connections to Semitic languages include a heavy reliance on fricatives (labial, dental, alveolar, post-alveolar, velar, and glottal), contrastive vowel length (except for /eː/ and /oː/), the (presumed) insertion of a glottal stop before word-initial vowels, and geminate consonants, often morphologically conditioned. The most complex syllables seem to be CV(ː)C. The stimulus sentences were selected from the primary source for Adûnaic, a lament for the destruction of Númenor, which is included in the most complete description of the language: “Lowdham’s Report on the Adunaic Language” (Tolkien, 2002, pp. 413–440).
The three fictional languages Klingon, Vulcan, and Atlantean, were all created by Marc Okrand. Klingon is the language of the Klingons, a fierce and honor-driven species of alien warriors from the science fiction franchise Star Trek. To convey the belligerence of the Klingons, Okrand deliberately designed a phonological system unlike any other in natural languages (see Okrand et al., 2011), including phonemes not found in English, such as the frequently used guttural and uvular consonants. According to Stockwell (2006, p. 9), these phonemes convey phonaesthetic harshness. In the first grammar of Klingon, Okrand describes the articulation of the grapheme 〈q〉 as the “sound of choking” and that of 〈Q〉 as “very guttural and raspy and strongly articulated” (cf. Okrand, 1992, pp. 14–15). The syllable structure of the Klingon words is CV(C) with frequent syllable initial and/or final glottal stops. Our Klingon stimuli come from Okrand’s first two books on the Klingon language: The Klingon Dictionary (1992) and The Klingon Way (1996).
Vulcan, another language from the Star Trek universe, is spoken by the Vulcans, a humanoid species whose actions are driven by the principles of logic rather than emotion. The first sentences of the Vulcan language were created by Marc Okrand after the relevant scenes had already been filmed. The sound had to be dubbed to match the lip movements of the actors (Okrent, 2009, p. 231), resulting in a phonological system based largely “on sounds and sound combinations found in English” (Okrand et al., 2011, p. 113). According to Okrand, at this stage there was “no attempt to assign meaning to individual words or impose any sort of grammatical structure” (Okrand et al., 2011, p. 114). The grammar and vocabulary of the Vulcan language have been further developed by a number of dedicated fans, the most notable work being The Vulcan Language by Gardner and The Vulcan Language Institute (2004). This version of the language, known as Golic Vulcan, was used in the experiment. Although unusual consonant clusters are found in some Vulcan words, their use is mostly limited to personal names. We therefore avoided using them in the experiment.
The third conlang by Marc Okrand used in this study is Atlantean from the animated film Atlantis: The Lost Empire (2001). The story focuses on an expedition of adventurers in 1914 who travel to the lost place of Atlantis. Atlantean, the language of Atlantis, was constructed to resemble mankind’s proto-language in the biblical tradition of the confusion of tongues after Babel (Langmaker, 2008). With this in mind, Okrand included a wealth of Proto-Indo-European roots as a major part of the Atlantean lexicon. However, he did not adhere too strongly to PIE phonology. The phonological system is typologically quite common. with a “moderately small” consonant inventory (see Maddieson, 2013) of 15 consonants and only two series of stops that contrast voice (Shadlag, 2006). It has five phonemic vowels /a, ɛ, i, ɔ, u/ with no phonetic or phonological length distinction.
Dothraki is the language of a nomadic warrior people from G.R.R. Martin’s A Song of Ice and Fire universe. They are brutal and aggressive, and their lives revolve around horses. The author of the book series invented 56 words (see Peterson, 2015), many of which are names of people and locations. For the Game of Thrones TV adaptation, the producers hired the linguist David Peterson to expand the Dothraki language for dialogue. He based its sound inventory on the existing words but used harsh-sounding consonants, such as the velar and uvular sounds /x, k, ɡ, q/ and the back vowels /o, a/ in high-frequency words. He also wanted Dothraki to sound distinctly different from English. Therefore, vowel sequences are produced as distinct vowels rather than as diphthongal vowels, the alveolar consonants are realized as dental, and the trilled apical /r/ is used as a suffix in word-final position. Stress is placed on the final syllable when words end with a consonant.
Na’vi was developed by the linguist Paul Frommer for the 2009 film Avatar. In the film, the language is spoken by the 10-ft tall, blue, humanoid alien people called Na’vi, who live on the extrasolar moon Pandora. The medium itself, as well as the director James Cameron, placed various demands on the language, such as a word length comparable to English due to the time constraints of the film, as well as human-like sounds despite the physical differences (see Frommer, 2009). Cameron gave Frommer about 30 names for characters, places, and animals on which to base the sound patterns. Since the speaker community in the film is a nonhuman alien species, Frommer incorporated several rather rare features of human languages into Na’vi to emphasize its otherness and uniqueness, such as the inclusion of the three ejective consonants /k’, p’, t’/, which are rarely found in Western languages. Frommer has chosen stress as a distinctive feature, rather than vowel length and tone.
Kesh is the language spoken by the fictional Kesh people, who are the main subject of Ursula K. Le Guin’s (2016) pseudo-ethnographic novel Always Coming Home, set in Northern California in a distant, post-apocalyptic future. Although the Kesh as a people are modeled on Native Americans (Cain & Conley, 2006; Skowrońska, 2018), the phoneme inventory is mostly similar to English, except for the two ejectives /t’/ and /p’/. Moreover, the phoneme /r/ shows some unusual allophones compared with English: it can be realized as [r], [ɾ], [ð], [dr], and it “often” seems to be lost after vowels, with the vowel getting an r-color. 4 Kesh words and a few phrases are scattered throughout the book and are collected in a glossary at the end, along with some additional material and notes on the Kesh writing system and phoneme inventory. A special edition of the novel also includes a cassette of recordings of songs in Kesh. Our stimuli, however, have been taken from the book and slightly modified.
Unlike the conlangs mentioned above, ʕuiʕuid is not known from a film, television, or literature. It is a language invented for the personal amusement of Dominique Bobeck, one of the co-authors of this article. 5 Its phonology is characterized by a medium-sized phoneme inventory (22 consonants, 5 vowels) and a very symmetrical system of obstruents regarding the features [voice] and [continuant]. Furthermore, one of its hallmarks is the presence of two pharyngeal fricatives. Otherwise, ʕuiʕuid has no typologically rare consonants. Concerning vowels, the abundance of different rising and falling diphthongs is remarkable. In terms of syllable structure, the most preferred syllable is CV. Codas are permitted, but branching onsets and codas do not exist.
1.4 Aims of this study
The aim of the current study is twofold: first, we want to investigate whether the conlangs described above evoke different impressions depending on how they sound. As was shown by Reiterer et al. (2020) for natural languages, ratings on several scales, such as pleasantness or status, are influenced not only by the sound structure, but also by the participants’ familiarity with a given language and by the individual voice characteristics of the different speakers. By testing conlangs, we hope to minimize the possible effects of familiarity and intelligibility. As was found recently by Malik-Moraleda et al. (2023), conlangs are processed similarly as natural languages and draw on the same neural mechanisms. Therefore, we do not expect general differences in language processing between unknown natural languages and conlangs. In addition, the voice factor is controlled here by using two model speakers who produce all the stimuli of each language. In Reiterer et al. (2020), different speakers produced the samples for each language to have stimuli spoken by native speakers. In our study, the stimuli were all produced in a voice without emotion and without additional sound effects that might, for example, enhance the impression of evilness in films. If we find a consistent pattern of ratings, we can conclude that the phonological structure of a language alone elicits different impressions. This would confirm the inherent value hypothesis (see Giles et al., 1979). Based on the inventors’ intentions and previous studies, we hypothesize that Orkish, Klingon, Khuzdul, and Dothraki should be evaluated more negatively than Sindarin and Quenya. For the other six conlangs, the predictions are not clear because the inventors did not specify what impression they wanted their language to make.
The second aim is to establish a relationship between the ratings and the phonological and phonetic properties of the conlangs studied. Based on previous findings, we predict that sonority and related characteristics (percentage of CV syllables, percentage of vowels, percentage of voicing) will be positively correlated with the ratings. Conlangs with many guttural sounds and back vowels, as well as sounds not found in the participants’ native language, should be rated negatively. According to Kogan and Reiterer (2021), emotive effects may play a role in the perception of speech rate and fundamental frequency. Faster speech rate and greater tonal variation are associated with positive emotions such as happiness and excitement, whereas sadness, depression, and anger are often associated with slower speech and the use of a more monotonous tonal contour.
2 Method
2.1 Stimuli
We used three sentences from each conlang (as described in Section 1.3). All sentences are provided with transcriptions, translations, and glosses in Appendix A. The following criteria were used to select stimulus sentences: (1) they should not contain any well-known buzzwords (e.g., Khaleesi for Dothraki), and (2) they should consist of at least 10 syllables. Each sentence was recorded by a female and a male model speaker in a quiet room at a 44100 Hz sampling rate. The speakers were instructed to speak in a neutral voice like in normal read speech, that is, they should not speak with dramatic intonation. The intensity level of all stimuli was adjusted via a Praat script that scaled the mean intensity of each file to 70 dB. The sampling rate was then adjusted to 22050 Hz to optimize the bandwidth for the online experiment. In total, 75 stimuli were presented during the experiments (3 sentences × 12 languages × 2 speakers, with 3 additional examples at the beginning of the experiment for testing and volume adjustment, the ratings of which were not included in the analysis). 6
2.2 Procedure
We employed the stimuli in two similar online rating experiments using the browser-based Percy platform (Draxler, 2017). The participants were recruited by sending the links to email lists of several universities, to family, friends, colleagues, or by posts on social media. The instruction language of the experiments was German, so the recruited participants are mainly native speakers and advanced learners of German.
Prior to the main part of the experiment, the following metadata were assessed: gender (choices: female, male, non-binary, other), age, the federal state in Germany or the country of enrollment at school, first language(s), second language(s), language(s) of parents, dialect spoken, background in linguistics, educational qualifications, input and output devices and in which surroundings participants performed the experiments (at home, at the office, in a studio, at a public place, en route somewhere).
In the main part, participants rated one stimulus at a time (see Figure 1). By clicking on the headphone symbol, they could listen to the stimulus once. Each stimulus could only be played twice. The main task for each listener was to rate each stimulus according to the impression it made on them on three 7-point semantic differential scales: the first scale was goodness (with one evil and seven good), the second peacefulness (with one aggressive and seven peaceful) and the third pleasantness (with one unpleasant and seven pleasant). The scales were ordered on top of each other and were randomly reordered with each new stimulus. The 72 experimental stimuli were presented in random order as well as counterbalanced for each participant. Rating without playing the stimulus was not possible. No time limit was set for the rating. A pause was included after 36 stimuli.
Figure 1.
Screenshot from the experiment.
After finishing the main part, the participants were asked whether they had recognized any of the languages. They then had the opportunity to listen to one additional stimulus of each language, following which they could type their guess as to the language identity within a text field. Since participants only entered a specific language for 12% of the stimuli the results of this part will not be considered here.
Initially, the participants were not told that they were hearing stimuli from constructed languages because we wanted to avoid any influence which knowledge about the fictional nature of our stimuli might have had (e.g., the association between Orkish and Orcs in the movies). However, after receiving the feedback that some participants found it very difficult to rate a language as good or evil because of concerns over potential racism, we modified our instructions. To ease these concerns and avoid potential bias, we subsequently told participants in the modified experiment that they would hear fictional languages from the Fantasy and Science Fiction genres, but not which specific languages. In the analysis the initial group of participants was identified as the “NoInst” (no instruction) group, and the second group as the “Fantasy” group. Since the ratings were not significantly affected by the instructions, the “NoInst” and “Fantasy” groups were pooled together for further analyses.
2.3 Participants
In both sections of the experiment, 232 participants filled out the metadata questionnaire and started the online experiment, 151 for the NoInst part and 81 for the Fantasy part. We excluded 36 participants who were not native speakers of German and 56 participants who finished before rating at least one stimulus of each language. In addition, seven participants were excluded because they showed little variation in their ratings, as determined by a standard deviation of the overall ratings lower than 0.6. In total, 87 participants from the NoInst group and 42 participants from the Fantasy group remained for further investigation. Metadata about the participants are summarized in Table 2. For background in linguistics we used the following categories: “degree” if the participants stated that they had at least a BA in linguistics or related fields, “undergraduate” if they are currently taking classes in linguistics but have not finished yet, and “no” if they stated no background or only an interest in linguistics.
Table 2.
Characteristics of Participants.
All | NoInst | Fantasy | |
---|---|---|---|
Gender | |||
Female | 84 | 55 | 29 |
Male | 44 | 32 | 12 |
Non-binary | 1 | 0 | 1 |
Age | |||
M | 33.58 | 36.86 | 26.79 |
SD | 14.99 | 14.86 | 12.66 |
Range | 17–71 | 17–71 | 18–67 |
Background in linguistics | |||
Degree | 21 | 19 | 2 |
Undergraduate | 35 | 13 | 22 |
No background | 73 | 55 | 18 |
Education | |||
Secondary school | 5 | 2 | 3 |
High school | 55 | 22 | 33 |
Academic degree | 68 | 62 | 5 |
Other | 1 | 1 | 0 |
2.4 Phonological and phonetic metrics
In this section, we describe the calculation of the phonological and phonetic characteristics. Cross-linguistically, the pleasant sound of a language correlates with larger overall sonority (Johannesson, 2007; Reiterer et al., 2020). For our study, we adapted the sonority index by Fought et al. (2004) from American English for the stimuli from the conlangs investigated here. Different sonority values are assigned to each speech sound group according to its manner of articulation, voicing, and vowel height. As shown in Table 3, the values decrease from 100 for open vowels to 2 for voiceless and voiced stops. Several indices as well as the syllable structure are derived from this classification.
Table 3.
Sonority Scale adapted from Fought et al. (2004).
Class | Label | Scale value | Examples |
---|---|---|---|
A | Low vowels | 100 | [a, ɔ, æ] |
O | Mid back vowels | 80 | [o, ɤ] |
E | Mid nonback vowels | 69 | [e, ɛ, ə] |
U | High back vowels | 65 | [u, ɯ] |
I | High nonback vowels | 41 | [i, ɪ] |
J | Semivowels and approximants | 27 | [j, w] |
R | Rhotic consonants | 36 | [r, ɾ, ɻ] |
L | Lateral sonorants | 17 | [l, ʎ, ɫ] |
N | Nasal sonorants | 9 | [m, n, ŋ] |
S | Voiceless fricatives | 4 | [f, s, θ] |
Z | Voiced fricatives | 3 | [v, z, ð] |
D | Voiced stops | 2 | [b, d, ɡ] |
T | Voiceless stops | 2 | [p, t, k] |
To calculate the syllable rate of each stimulus, syllable counts were derived from the phonological transcription of the stimuli as follows: syllable boundaries were determined by the sonority values, occurring prior to a local sonority minimum with a following rise (see Johannesson, 2007).
In addition, every stimulus was automatically annotated with the classes to which the segments belong, long vowels counting twice. From these annotated stimuli the metrics described below were calculated. 7 The expected relationship of these metrics to the ratings is also given in brackets, with a “+” sign for an expected positive correlation and a “–” sign for a negative correlation:
SonInd (+)—the sonority index is the mean sonority, according to Fought et al. (2004), of the stimulus, that is, the sum of the scale values (see Table 3) for each phoneme in the stimuli, divided by the number of phonemes.
SonIndC (+)—the sonority index for consonants per stimulus, as above, but only taking into account the consonants, that is, excluding the categories A, O, E, U, and I (see Table 3).
PctOpenSyll (+)—the percentage of syllables (as defined above) whose last segment is a vowel, that is, syllables that do not have a consonantal coda.
PctVowels (+)—the percentage of vowels is the share of vowels (categories A, O, E, U, and I as above) out of all segments. Long vowels and diphthongs count as two vowels.
PctObstr (−)—the percentage of obstruents is the share of obstruents (categories S, Z, D, and T) out of all segments of the stimulus.
PctObstrOfC (−)—the percentage of obstruents out of all consonants: calculated as above, but only with respect to the consonants.
PctBackVOfV (−)—the percentage of back vowels is the share of back vowels (categories O and U) out of all vocalic segments.
PctGutt (−)—the percentage of guttural sounds, that is, the share of “guttural” segments, that is, any uvular, pharyngeal, or glottal segments, out of all segments in the stimuli.
PctGuttVel (−)—the percentage of guttural and velar segments like PctGutt but also including velar segments.
PctNonGerman (−)—the percentage of sounds that do not occur in the sound inventory of German (as described by Kohler, 1990), which was the most frequent L1 of the study participants. The regional allophonic variants [r, r] of Standard German /ʁ/ are included.
Table 4 shows the metrics applied to an Adûnaic stimulus.
Table 4.
Outcome of the Metrics for an Adûnaic Stimulus.
Transcription | [arfaraˈzoːnun aˈzagːara awaˈloːijada]] |
---|---|
Classes | AR.SA.RA.ZOO.NUN.A.ZAD.DA.RA.A.JA.LOOI.JA.DA |
SonInd | 55.81 |
SonIndC | 14.57 |
PctOpenSyll | 78.57% |
PctVowels | 54.84% |
PctObstr | 19.35% |
PctObstrOfC | 42.86% |
PctBackVsOfV | 29.41% |
PctGutt | 0% |
PctGuttVel | 6.45% |
PctNonGerman | 3.23% (because of [w]) |
The acoustic signals of the recordings were used for several phonetic measurements. Voiced and voiceless intervals of each stimulus were annotated manually with the value “+” and “–” by means of Praat (Boersma & Weenink, 2019). The fundamental frequency contours were calculated based on the measured voiced stretches by applying the periodicity detection algorithm by Schäfer-Vincent (1983), as implemented in the R package wrassp (see Bombien et al., 2020; Winkelmann et al., 2017). The F0 ranges were set to 80–500 Hz for the female speaker and to 50–400 Hz for the male speaker.
PctVoiced (+)—percentage of voiced intervals relative to speech intervals (excluding pauses).
SyllRateN (+)—syllable count (see above) divided by the speech interval. This value was then z-scored (i.e., the speaker-specific mean was subtracted and then divided by the standard deviation) to abstract from speaker-specific speech rate differences.
F0N (+)—median F0 of the F0 contours during the voiced stretches as a measure of relative tone level differences. This value was z-scored to abstract from speaker-specific differences.
IQRF0N (+)—the interquartile range of the distribution of F0 values as a robust measure of tonal modulation within each stimulus. This value was z-scored to abstract from speaker-specific differences.
2.5 Statistics
The three rating scales were rescaled to −3 to 3 by subtracting the midpoint 4 and multiplying by −1, thus positive values correspond to positive terms on the scales. To test whether the ratings were affected by the constructed language, we first calculated linear mixed effects models using the lme4 package v1-1-30 in R (see Bates et al., 2015; R Core Team, 2020). We tested whether the dependent variable Rating differed for the scale type (three scales: pleasantness, goodness, and peacefulness), for the language (12 levels), speaker (2 speakers: 1 female and 1 male). The factor Speaker was included based on the role of voice for judging natural languages found in, for example, Anikin et al. (2023) and Reiterer et al. (2020). Participant (N = 129) and sentence (N = 36: three sentences in 12 languages) were included as random intercepts as recommended by Judd et al. (2012) to deal with the low number of stimuli per language. Speaker was included as a random slope for sentence. Post hoc tests were calculated by Wilcoxon pair comparisons using the R package rstatix v0.7.1 (see Kassambara, 2022). For an easier interpretation, the 12-level factor language was sorted by average ratings per scale. Furthermore, due to this multilevel factor, only effects with adjusted p-values (using Holm’s method) smaller than .001 were considered to be significant.
To assess which of the phonetic and phonological parameters best predicts the rating scores, we calculated regressions using linear mixed models with random effects per scale. To avoid collinearity between the predictors, we selected them based on theoretical considerations, as explained in the Results section (as recommended by Smith, 2006). Since the phonological characteristics were calculated based on the transcriptions for each sentence, we included sentence as a random intercept. For the phonetic characteristics, that were measured for each of the 72 stimuli (36 sentences produced by two speakers), stimulus was entered as a random intercept. The second random intercept was always listener. For each model, the marginal (Rm2) and conditional coefficients of determination (Rc2) were calculated (package MuMIn v.1.47.1 Bartoń, 2022). Marginal Rm2 corresponds to the variance explained by the fixed factors, conditional Rc2 corresponds to the variance explained by both fixed and random factors (Nakagawa & Schielzeth, 2013).
3 Results
3.1 Ratings
As a first overview, Figure 2 shows the mean ratings pooled for the three scales pleasantness, goodness, and peacefulness since they are highly correlated (pleasantness–goodness: .64***, pleasantness–peacefulness: .65***, goodness–peacefulness: .76***). The color coding is based on the impressions of the languages intended by their inventors (see Table 1) if known. Clearly, the participants in this experiment showed differential ratings as intended for most languages, with Klingon and Dothraki (in orange) being the most negatively rated and Quenya and Sindarin (in green) the most positively rated. Khuzdul and Orkish are rated more positively as expected based upon the intentions of the inventors. The ratings were not completely consistent within each language but varied for different stimuli, as can be seen in Figure B1 in Appendix B.
Figure 2.
Mean ratings with standard errors, averaged for scales, sorted by mean and color-coded for intended impression (see Table 1). Rating scale ranges from −3 to 3.
To analyze whether the differences between languages are significant, linear mixed effects models were calculated (see Section 2.5 and Appendix C). Since there was a significant interaction between ScaleType and Language, separate models were calculated for the three different scales. The effects of Language on the three scales are shown in Figure 3. Significant differences are only shown for the closest differing means, for instance, for the pleasantness ratings Sindarin is rated significantly more pleasant than Vulcan which implies that it is also more pleasant than all languages with smaller ratings than Vulcan (e.g., left of Vulcan, ʕuiʕuid, Orkish). Although the ratings and the order of languages differ in the scales, there are some consistent effects: Klingon, Dothraki, and Na’vi are often rated negatively or around zero. The Elvish languages Quenya and Sindarin, as well as Kesh and Vulcan, are rated more positively on all three scales. The other languages vary in-between. The three scales differ with respect to the extent: pleasantness showed more extreme values, especially in the negative direction, than the other two scales.
Figure 3.
Mean ratings in ascending order with standard errors per scale. Only significant differences of closest neighbors are shown. Scales range from −3 to 3.
Regarding the two speakers, the ratings for the female speaker (red line in Figure 4) are steeper and also show more extreme values than those for the male speaker (blue line). The largest difference between the two speakers can be found for Adúnaic on all three scales, and Quenya was consistently rated more positively when produced by the female speaker.
Figure 4.
Modeled fits and standard errors for pleasantness, goodness, and peacefulness scales for the two speakers f and m. Scales range from −3 to 3.
In general, the results from the ratings give evidence that the raters are able to detect differences between the languages and rated them based on these. Furthermore, the rating results confirm most of our expectations: Klingon and Dothraki were consistently rated negatively whereas the two Elvish languages, Sindarin and Quenya, were rated most positively (at least for the female speaker). This is mostly in agreement with the intention of the inventors of these languages. Orkish and Khuzdul, however, seem to sound surprisingly pleasant, good and peaceful despite Tolkien’s intention to make them sound harsh.
3.2 Relationship between ratings and phonological and phonetic characteristics
In this section, the relationship between the ratings and the phonological and phonetic characteristics of the stimuli is investigated by means of linear mixed effects regression analyses. Phonetic and phonological characteristics are analyzed in separate models because the phonological characteristics are extracted per sentence (i.e., three sentences per language), whereas the phonetic characteristics are measured per stimulus (i.e., two recordings of each sentence). The averaged phonological characteristics per language are shown in Appendix D, Table D1 sorted by “intended impression.” To avoid collinearity, we first calculate the correlation coefficients between the phonological variables. Table 5 shows the Pearson product-moment correlation coefficient with the significance levels (dark gray: p < .01, light gray: p < .05). As can be seen, the sonority related variables SonInd, SonIndC and PctVowels are positively correlated with each other and negatively correlated with the consonantal variable PctObstr. Furthermore, the variables PctObstr, PctObstrOfC, PctGutt, PctGuttVel, and partly PctNonGerman are positively correlated with each other. Based on the correlation structure and theoretical considerations we select three variables: PctGuttVel, PctNonGerman, and SonInd. PctGuttVel is selected because the existence of guttural and velar sounds in languages has often been associated with “harshness” (see, for example, Stockwell, 2006). Additionally, PctNonGerman is included as a measure of “strangeness” and SonInd because it is not correlated with the other variables or shows only weak correlations and has been used in previous studies on natural languages (see, for example, Fought et al., 2004; Reiterer et al., 2020). Since the ratings differ significantly for the three different rating scales (see Section 3.1), separate models are calculated per scale.
Table 5.
Correlation Coefficients and Significance Levels for the Phonological Characteristics for All Sentences (N = 36) (Values in Dark Indicate Statistically Significant Results for p < .01 and in Light Gray for p < .05).
![]() |
Table 6 shows the results of the regressions for the phonological variables. As can be seen, only the variable PctNonGerman has a significant negative effect on ratings, for example, languages with sound systems that are similar to German are more positively rated. The slope of this relationship is steeper for the pleasantness scale compared with the goodness and peacefulness scales. This is also shown in Figure 5 as a scatterplot of PctNonGerman and the ratings averaged over sentences for each scale. Sentences with more non-native sounds were rated as more unpleasant, more evil, and more aggressive than stimuli with more native sounds. Interestingly, and contrary to what was found in previous studies (see, for example, Reiterer et al., 2020; Stockwell, 2006), the variables SonInd and PctGuttVel did not contribute significantly to the prediction of the ratings. Figure 5 suggests that the significant relationship of PctNonGerman could be driven by Klingon (pink squares with crosses), whose stimuli have up to 45% of non-German sounds. Indeed, excluding the values for Klingon changes the results: the slopes for the goodness and peacefulness ratings are not significant. For the pleasantness scale, the slope for PctNonGerman stays significant after excluding the values for Klingon.
Table 6.
Results From Linear Mixed Models of the Rating Scales as Dependent Variable and Phonological Characteristics as Independent Variables (Number of Observations = 9,012) With Participant (129) and Sentence (36) as Random Intercepts.
Estimate | SE | df | t value | p | |
---|---|---|---|---|---|
Pleasantness scale | |||||
(Intercept) | 0.40 | 0.66 | 32.79 | 0.61 | |
SonInd | 0.01 | 0.02 | 32.01 | 0.51 | |
PctGuttVel | −0.02 | 0.01 | 32.01 | −1.83 | |
PctNonGerman | −0.03 | 0.01 | 32.01 | −3.63 | *** |
Rm2/Rc2 | 0.100 | 0.478 | |||
Goodness scale | |||||
(Intercept) | 0.78 | 0.54 | 32.81 | 1.44 | |
SonInd | −0.00 | 0.01 | 32.00 | −0.16 | |
PctGuttVel | −0.01 | 0.01 | 32.01 | −0.71 | |
PctNonGerman | −0.02 | 0.01 | 32.00 | −2.20 | * |
Rm2/Rc2 | 0.025 | 0.384 | |||
Peacefulness scale | |||||
(Intercept) | 0.76 | 0.76 | 32.41 | 1.00 | |
SonInd | 0.00 | 0.02 | 32.00 | 0.10 | |
PctGuttVel | −0.01 | 0.01 | 32.00 | −0.66 | |
PctNonGerman | −0.02 | 0.01 | 32.00 | −2.20 | * |
Rm2/Rc2 | 0.043 | 0.420 |
p < .001; ** p < .01; * p < .05.
Figure 5.
Scatterplot for ratings (averaged per sentence) and the percentage of non-German sounds, color-coded for language. Pearson correlation coefficients based on the averaged data are also given with the values. Rating scales range from −3 to 3.
The phonetic characteristics PctVoiced, SyllRateN, F0N, and IQRF0N are calculated for each of the 72 stimuli. For SyllRateN, F0N and IQRF0N, we use normalized values because we are not interested in the difference between the speakers but the variation within each speaker per stimulus. Table 7 reports the correlation coefficients between the phonetic variables. The linear mixed effects models are calculated per scale and include speaker as a fixed factor. The results are shown in Table 8. For all three scales, PctVoiced has a significant positive slope with more positive ratings for more voicing in the stimuli (see Figure 6). Furthermore, for the pleasantness scale, the normalized tone level (F0N) has a significant negative slope. In other words, the lower the normalized fundamental frequency, the more pleasantly the stimulus is rated. The ratings on the goodness scale are positively correlated with the normalized variation of the fundamental frequency, namely, the smaller the variation, the flatter the F0 contours, the more evil the stimuli were rated. SyllRateN and IQRF0N do not affect the ratings significantly. For all three scales, the fixed effect speaker is significant with lower ratings for the male speaker. Klingon again seems to drive the significant slope for PctVoiced. For the pleasantness scale excluding Klingon still yields significant slopes for PctVoiced and F0N and a significant main effect for speaker. For the other two scales the slopes are not significant when excluding Klingon.
Table 7.
Correlation Coefficients and Significance Levels for the Phonetic Characteristics, Based on All Stimuli (N = 72).
p < .001; ** p < .01; * p < .05.
Table 8.
Results From Linear Mixed Models of the Rating Scales as Dependent Variable and Phonetic Characteristics as Independent Variables (Number of Observations = 9,012) With Participant (129) and Sentence (36) as Random Intercepts.
Estimate | SE | df | t value | p | |
---|---|---|---|---|---|
Pleasantness scale | |||||
(Intercept) | −1.78 | 0.44 | 69.70 | −4.06 | *** |
PctVoiced | 0.03 | 0.01 | 66.00 | 4.98 | *** |
SyllRateN | 0.01 | 0.07 | 66.00 | 0.21 | |
F0N | −0.18 | 0.07 | 65.99 | −2.64 | * |
IQR_F0N | −0.04 | 0.07 | 66.00 | −0.57 | |
speakerm | −0.49 | 0.13 | 66.00 | −3.81 | *** |
Rm2/Rc2 | 0.010 | 0.481 | |||
Goodness scale | |||||
(Intercept) | −0.34 | 0.39 | 69.27 | −0.86 | |
PctVoiced | 0.01 | 0.01 | 66.01 | 2.45 | * |
SyllRateN | −0.01 | 0.06 | 66.01 | −0.14 | |
F0N | −0.07 | 0.06 | 66.00 | −1.15 | |
IQR_F0N | 0.05 | 0.06 | 66.01 | 0.82 | |
speakerm | −0.28 | 0.11 | 66.02 | −2.48 | * |
Rm2/Rc2 | 0.024 | 0.409 | |||
Peacefulness scale | |||||
(Intercept) | −0.73 | 0.52 | 67.80 | −1.40 | |
PctVoiced | 0.02 | 0.01 | 66.00 | 2.68 | ** |
SyllRateN | −0.05 | 0.08 | 66.00 | −0.60 | |
F0N | −0.09 | 0.08 | 66.00 | −1.11 | |
IQR_F0N | 0.02 | 0.08 | 66.01 | 0.20 | |
speakerm | −0.31 | 0.15 | 66.01 | −2.03 | * |
Rm2/Rc2 | 0.037 | 0.443 |
p < .001; **p < .01; *p < .05.
Figure 6.
Scatterplot for mean ratings and the percentage of voiced sounds (PctVoiced) for the female speaker (top) and the male speaker (bottom), color-coded for language. Rating scales range from −3 to 3.
To summarize so far, the relationship between the ratings and the phonological and phonetic characteristics of the stimuli is strongest for the pleasantness scale, probably because this scale shows the largest range in the rating values. The linear mixed effects models show significant relationships between all rating scales and the characteristics PctNonGerman and PctVoiced. The ratings are negatively correlated with the percentage of non-German sounds, which means that stimuli with many non-German sounds are rated as more unpleasant and in a weaker relationship also more evil and aggressive. In addition, the ratings are positively correlated with PctVoiced, in other words, stimuli with more voiced sounds are rated more positively. However, for the goodness and the peacefulness ratings these relationships are mainly driven by the stimuli from Klingon that consist of a large percentage of non-German sounds and many voiceless sounds. For the pleasantness scale the effects of PctNonGerman and PctVoiced are more stable and not based on Klingon.
4 Discussion
The first aim of this study was to investigate whether listeners judge stimuli from several fictional languages as sounding more positive or negative even when the stimuli were produced in a voice without emotional involvement. Our results from the rating experiment indicate that listeners do this indeed. For example, they perceive Klingon and Dothraki as significantly more unpleasant, evil and aggressive than the Elvish languages Sindarin and Quenya, as intended by the inventors of these languages. The most positively rated stimuli on all three scales were from Quenya, confirming Beinhoff (2023), while Klingon provided the most unpleasant, evil and aggressive stimuli. However, our expectations for Orkish and Khuzdul were not confirmed: they sound more positive than expected. We will discuss this below, as several other points can be illustrated by our results on Orkish.
Since our conlangs are unknown to most people (at least as far as their sound is concerned), most participants in this study had to base their judgments solely on the sound structure of the stimuli presented. This is consistent with the studies by Moreau et al. (2014) and Hilton et al. (2022) on natural languages that were unfamiliar to the raters. Anikin et al. (2023) demonstrated that even in case of misidentifying, languages perceived as familiar were more strongly preferred than languages listeners reported unfamiliar. One drawback of our experimental design is that we cannot be completely sure that the participants did not recognize some of the stimuli while rating them. Since the participants should focus on the impression a stimulus makes and not on identifying it, we separated the two tasks and used different sentences for the guessing part that came after the rating part. As a result, we did not have any access to examine participants’ perceived familiarity with the stimuli. However, in the guessing part most stimuli were not identified correctly. 8 Therefore, we can largely rule out the effects of social traits or imposed norms on the ratings. Nonetheless, the Elvish languages were rated more positively than Klingon and Dothraki, even though the listeners based their evaluations on very few and very short stimuli per language. This supports the inherent value hypothesis (Giles et al., 1979; Trudgill & Giles, 1978) and the sound-driven hypothesis (Van Bezooijen, 2002).
Our second goal was to investigate the relationship between the sound structure of the stimuli and the ratings in detail. We found that the structural characteristic PctNonGerman had a significant impact on the perception of the languages. This was most pronounced and consistent for the pleasantness scale, whereas the effects on the goodness and peacefulness scales were weaker and mainly driven by Klingon. The acoustic measure PctVoiced influenced the ratings in a positive direction. In addition, for the pleasantness and peacefulness scales, the stimuli were rated more negatively when spoken at a higher normalized pitch. Thus, the listeners preferred languages consisting of native sounds with more voiced sounds and a lower tone level. However, in our study, contrary to the assumptions and findings of previous studies on conlangs (e.g., Elsen, 2019; Johannesson, 2007; Podhorodecka, 2007), the phonological characteristics related to sonority (SonInd) and consonants produced in the back of the throat did not significantly predict the impressions of these languages. Based on previous studies, we would also have expected an effect of (high) front vowels and (low) back vowels on the perception of conlangs. /i, e/ are typically associated with smallness and brightness while /u, o/ with size and darkness (Crystal, 1995; Winter & Perlman, 2021), leading us to our assumption that these vowels would make a stimulus sound more positive or negative. However, this could not be confirmed by our data. As shown in Table D1 (see Appendix D), some of the phonological features, such as the percentage of back vowels PctBVOfV and the sonority index SonInd and SonIndC do not exhibit a large range of values for the tested sentences. A larger corpus per language (as, for example, in Anikin et al., 2023; Reiterer et al., 2020, for natural languages) would probably yield more variation between languages; the phonological characteristics in the current study are derived from three short sentences per language. Another possible reason for our null results, inter alia, for SonInd and PctGuttVel, is that the above-mentioned studies (Elsen, 2019; Johannesson, 2007; Podhorodecka, 2007) compare conlangs with regard to their phonological characteristics, for example, whether conlangs designed to sound harsh differ in phonological characteristics from those designed to sound pleasing. Our study applies a different methodology, predicting ratings from these measures, similar to studies on natural languages, such as Anikin et al. (2023), Kogan and Reiterer (2021), and Reiterer et al. (2020).
Although the rating results on all three scales are similar in terms of the general direction of the ratings and the order of the preference, the ratings are not identical. The results show less extreme ratings for the goodness and peacefulness scales compared with the pleasantness scale. In general, the ratings of the goodness and the peacefulness scales tend to be less negative compared with the positive deflection (see Figure 4).
For the goodness scale, one possible explanation is that at least some participants were reluctant to rate a language negatively. Initially, we did not give the participants any information about the nature of our languages, namely, whether they were natural or fictional languages. The feedback we received suggests that some participants felt uncomfortable with the goodness scale because they felt it was racist to rate a language as evil. Following this feedback, we tried to avoid this moral effect by explaining in the instructions that they would be listening to fictional languages from the Fantasy and Science Fiction genres (see Section 2.2). However, this framing of the experiments did not significantly affect the range of the ratings on the scales. It may therefore be that it is more natural and therefore easier for people to perceive languages as “unpleasant” than as “evil” or “aggressive,” and our participants were accordingly reluctant to use negative values for these scales. This is a first factor that interferes with the effect of the sound of the examined languages (inherent value).
The second factor that significantly influenced the ratings was the speaker who produced the stimuli. In Reiterer et al. (2020)’s rating study on European languages, voice was the most important factor, explaining 70% of the variance in the rating results. In their experiment, the stimuli from each language were produced by different speakers because they wanted them to be produced by native speakers of those languages. For obvious reasons, we could not use native speakers for our study. So the stimuli for all languages were produced by two model speakers, a female and a male speaker, both German linguistics students, who practiced the sentences based on phonetic transcriptions. They were instructed to produce all utterances without emotional involvement in their voice. Therefore, in our experiment we were able to control for the voice factor much better than Reiterer et al. (2020) in their natural language rating study. Nevertheless, we also found that the female and the male speakers were rated differently. In particular, the two Elvish languages Sindarin and especially Quenya, were rated more positively for the female speaker than for the male speaker. Klingon was rated as more evil and aggressive when produced by the female speaker and there were some differences for Atlantean and Adûnaic (see Figure 4). In general, the stimuli of the female speaker were rated with more extreme values compared with those of the male speaker. Whether these interindividual differences could be attributed to the gender of the speakers cannot be answered with the current experiment, as we only used one female and one male speaker. Thus, the results may simply be a consequence of speaker-specific differences that are unrelated to their gender. We would need more speakers of each gender to investigate this.
To our surprise, Orkish (and partly also Khuzdul) was rated much more pleasant, good, and peaceful than intended by Tolkien (see Tolkien, 2021), or demonstrated in previous studies such as Johannesson (2007), Podhorodecka (2007), and Elsen (2019). At the first glance, this seems to be an argument against the inherent value hypothesis since Orkish should have a bad, evil, and aggressive sound, but at the second glance this outcome actually supports it. We drew on Neo-Orkish phrases created by David Salo for The Lord of the Rings and The Hobbit films to avoid the use of well-known Orkish phrases such as the ring inscription: Ash nazg durbatulûk, ash nazg gimbatul, ash nazg thrakatulûk, agh burzum-ishi krimpatul 9 (Tolkien, 2021, p. 254). The words in the ring verse, which has been the basis for previous studies, such as Podhorodecka (2007), consist mainly of syllables with many obstruents in the coda. Our Neo-Orkish stimuli, on the contrary, have more open syllables (PctOpenSyll = 65.5%, measured as described in Section 2.4, averaged per language, shown in Table D1 in Appendix D) than Quenya (52%) and Sindarin (32%). The same applies to the PctVowels, where Neo-Orkish is more similar to Sindarin and Quenya than to Klingon and Dothraki. So perhaps the sound of Neo-Orkish does not match the evil sound of Tolkien’s original Black Speech samples so that its sound (inherent value) leads to a more positive evaluation than would be expected from the social traits of the Orkish speakers, their appearance, and the intentions of the inventors Tolkien and Salo.
It may also be that our speakers performed the Orkish stimuli in a too human way. In other words, they did not imitate the rough, hissing, and snarling pronunciation of Orcs because we interpreted these features as special sound effects. However, at least David Salo may have relied on them as inherent features of Orkish speech for achieving the aim of producing an evil Orkish sound in the movies. We do not have any information that would indicate or confirm that this is the case, though. But it could be interesting in a future study to investigate how the evaluation of Orkish changes when the stimuli are produced in a more threatening and more “Orkish” way and to compare these results with the evaluation of, for example, Quenya produced in a threatening way. This could shed light on the impact of prosodic features and on the interplay of prosodic features and the segmental level (for the importance of prosodic features for the iconicity of ideophones see also Dingemanse et al., 2016).
Alternatively, the rating results for Orkish may depend on our Orkish sample: we may have accidentally selected stimuli that happen not to sound evil. Dingemanse et al. (2016) and Van Hoey et al. (2023) noticed that even in natural languages, such as Korean and Igbo, not every ideophone is equally suitable to test iconic effect. Hence, a preselection would be beneficial for enhancing reliability. This is an inherent methodological problem of our study, which also applies to all the other languages we have investigated here. We have based our study on three short sentences per language. We also tried to avoid buzzwords so that participants could not easily identify a language. This further limited the set of possible sentences, as there were not many accessible sentences in some languages, which precluded us from preselecting test stimuli. This means that we have a very small corpus per language, which may not be representative of the language as a whole. As a result, our findings may be more valid for the sentences alone than for the whole language from which the sentences are from. Hence, we have to restrict the significance of our results concerning our first aim, namely, to evaluate whether conlangs consistently evoke the impressions intended for them when spoken without emotional involvement, sound effects, and so on. We cannot be sure that our results do not depend on our limited selection of stimuli. However apart from the case of Orkish and—somewhat less—also Khuzdul, our expectations with respect to how a conlang should be perceived have been confirmed in general, and our results are valid for the phonetic properties as we examined them over all 72 stimuli in the regression analysis. This sample is large enough for making claims concerning the second aim of our study.
Finally, the positive impression of Orkish could be influenced by the native language of the participants of our study, which was different from Tolkien’s. It could be that Orkish was designed to sound evil primarily to English native speakers. As was shown by Reiterer et al. (2020), the sonority index for German is only slightly lower than for English and should not contribute to an effect of the native language. However, the more common consonant clusters, the uvular trill/voiced fricative /ʀ/~/ʁ/ and the voiceless uvular fricative /χ/ in Orkish may sound evil or unpleasant to English native speakers, but because these sounds belong to the inventory of the German consonant system and follow German phonotactics, this may not be the case for native speakers of German. Our Orkish stimuli contain even fewer non-German sounds than Quenya and Sindarin (see Table D1). More generally, the PctNonGerman metric significantly affected the ratings of all scales (see Table 6) with more positive ratings for less foreign-sounding languages. It is interesting to consider whether the notion of strange or foreign sounding derives from the languages with which participants are familiar (i.e., the familiarity-driven hypothesis, cf. Peterson, 2015; Stanley, 2003; Van Bezooijen, 2002), or whether it is influenced by phonological universals associated with markedness (defined by the frequency of sounds and sound combinations in the world’s languages). This notion has been discussed in an essay by Brunner (2014), who suggests that Italian is considered as a beautiful and pleasant language because it is composed of sounds that occur in most languages, that is, the Italian sound inventory consists of very few phonemes that are rare in other languages. However, no experimental evidence is presented for this claim. It is furthermore not supported by our results because languages with typologically rare sounds, such as ʕuiʕuid and Sindarin, are among the better rated languages. We therefore assume that the notion of strange, foreign, or otherness derives from the languages with which participants are familiar (cf. Mooshammer et al., 2023).
Familiarity was also a crucial factor for positive ratings in Reiterer et al.’s (2020) study, but had no effect in Van Bezooijen’s (2002) study. While both studies based their assessment of familiarity on ratings, they used different methodologies (laypeople rating language samples in the former study and trained phoneticians rating languages without audio samples in the latter). However, it is not clear whether their rating results on familiarity are comparable to our measure of PctNonGerman because it is built more on similarity or dissimilarity than on familiarity. Thus, taking the significant effect of PctNonGerman sounds into account, we hypothesize that sound similarity or dissimilarity to the participants’ L1 is another factor other than the purely phonetic measures that influence the assessment of a given language. It needs, however, data from participants with different L1 to test this hypothesis (see Section 5) and to exclude an effect of typological universals.
In addition to PctNonGerman, PctVoiced and F0N (only for the pleasantness scale) also had an impact on the participants’ rating scores. In contrast to the PctNonGerman metric, these metrics may potentially have a cross-linguistically universal validity, since PctNonGerman is a relational metric that takes into account the German phoneme inventory as a reference point, while PctVoiced and F0N do not. So they could also have a similar effect on speakers of other languages as they do on speakers of German. If we were to repeat the experiment with speakers of another language, we would, however, expect PctNonLanguage to correlate with the results for any language in a way PctNonGerman does in this study. A cross-linguistic effect of the non-relational metrics could perhaps depend on some kind of perceptional universals. The example of maluma and takete may hint in this direction, since the two words differ regarding the voicing of the consonants which might cause the association with a round or spiky shape (see, for example, Lockwood & Dingemanse, 2015). Such universals could be caused by the association of certain speech sounds with universal non-speech sounds and with the contexts in which they are produced or by some kind of general synaesthetic-like association of sounds with proximity or distance or with characteristics associated with these states (Podhorodecka, 2007; Stanley, 2003).
5 Conclusion
Overall, our results confirm our expectation that the evaluation of unfamiliar conlangs would depend on their inherent phonological and phonetic characteristics, supporting the inherent value hypothesis (Giles et al., 1979; Trudgill & Giles, 1978) and the sound-driven hypothesis (Van Bezooijen, 2002). Furthermore, we were able to identify the relevant inherent phonological and phonetic variables that led to these judgments—PctNonGerman, F0N, and PctVoiced, which affected the pleasantness ratings significantly, even after excluding the exceptional language Klingon. Because of the significant effect of PctNonGerman, we conclude that otherness or dissimilarity with the sound inventory of a language of reference like the L1 plays a significant role in the evaluation of a given unknown language, limiting the impact of the inherent value or sound-driven hypotheses, as this does not correspond to an inherent value of the sound of that language. Instead, the familiarity-driven hypothesis may be indirectly supported (cf. Van Bezooijen, 2002), since one is necessarily familiar with such a language of reference. To test the universality of the inherent phonological and phonetic variables identified here, we are currently repeating the online experiment with instructions in 14 (partly) typologically different languages (English, German, Italian, Spanish, Hungarian, Turkish, Bulgarian, Russian, Arabic, Georgian, Mandarin Chinese, Cantonese, Japanese, and Khoekhoe) and eliciting responses from speakers of these languages.
Acknowledgments
The authors express their gratitude to Christoph Draxler for his ongoing and patient support on the Percy Platform and to Mark Tiede, the editor, and the reviewers for very valuable comments on a previous version. Furthermore, they thank Daniela Palleschi for advise on statistics. Great thanks are owed to Felicitas Wegmann for recording the stimuli. They are also grateful to Erica Conti, Linda Dingfeld, Corinna Egdorf, Vanessa Gaebel, Isabella Greimel, Franziska Groth, Gajaneh Hartz, Stephanie Jandt, Saskia Körner, Runzhi Lou, Alena Maurer, Sophie Neumann, Lucie Petit, Melanie Roy, Anna Sondermann, Katja Sonnert, Paul Staubesandt, Hannah Warnemünde, Marie-Theres Weißgerber, and Pui Yee Yuen for assistance in material preparation and audio annotation. Parts of this study were presented at the 17th Phonetik und Phonologie Tagung at the University of Frankfurt in September 2021, at the Reading Fictional Languages: A Symposium at the University of Nottingham, March 2022, at the 18th Phonetik und Phonologie Tagung at the University of Bielefeld in October 2022, and at the Iconicity Seminar at the Hong Kong Polytechnic University in November 2022.
Appendix A
Fully glossed stimuli
The stimuli used in this study are glossed following the Leipzig Glossing Rules (see Comrie et al., 2015). Additionally, emph stands for emphatic, vblz for verbaliser, and pert for pertensive. The audio recordings can be found at https://doi.org/10.17605/OSF.IO/3DQAJ Stimuli.
Appendix B
Ratings per Stimulus
Figure B1.
Mean ratings per stimulus with standard errors, scales pooled, sorted by mean and color-coded for language). Rating scale ranges from -3 to 3.
Appendix C
Linear Mixed Effects models
Full model
Model equation:
rating ~ language * scale + speaker + (1 | listener) + (1 + speaker | sentence)
Table C1.
ANOVA table all scales, using Satterthwaite’s method, 27036 observation.
Sum Sq | Mean Sq | NumDF | DenDF | F value | Pr(>F) | |
---|---|---|---|---|---|---|
language | 109.21 | 9.93 | 11.00 | 24.00 | 8.24 | 0.0000 |
scale | 625.58 | 312.79 | 2.00 | 26811.74 | 259.69 | 0.0000 |
speaker | 14.51 | 14.51 | 1.00 | 35.00 | 12.04 | 0.0014 |
language:rated | 662.34 | 30.11 | 22.00 | 26811.74 | 25.00 | 0.0000 |
Rm2/Rc2 | 0.137 | 0.425 |
Models for each scale
Model equations:
rating ~ language * speaker + (1 | listener) + (1 + speaker| sentence)
Table C2.
ANOVA table for the pleasantness, goodness and peacefulness scales, using Satterthwaite’s method, number of observations = 9012.
Sum Sq | Mean Sq | NumDF | DenDF | F value | Pr(>F) | |
---|---|---|---|---|---|---|
Pleasantness | ||||||
language | 277.26 | 25.21 | 11.00 | 35.46 | 19.90 | 0.0000 |
speaker | 62.02 | 62.02 | 1.00 | 36.05 | 48.96 | 0.0000 |
language:speaker | 34.39 | 3.13 | 11.00 | 36.06 | 2.47 | 0.0203 |
Rm2/Rc2 | 0.171 | 0.476 | ||||
Goodness | ||||||
language | 69.67 | 6.33 | 11.00 | 24.00 | 6.21 | 0.0001 |
speaker | 15.85 | 15.85 | 1.00 | 24.01 | 15.53 | 0.0006 |
language:speaker | 46.51 | 4.23 | 11.00 | 24.01 | 4.14 | 0.0017 |
Rm2/Rc2 | 0.097 | 0.413 | ||||
Peacefulness | ||||||
language | 85.44 | 7.77 | 11.00 | 24.00 | 6.80 | 0.0000 |
speaker | 10.41 | 10.41 | 1.00 | 24.01 | 9.12 | 0.0059 |
language:speaker | 46.01 | 4.18 | 11.00 | 24.02 | 3.66 | 0.0038 |
Rm2/Rc2 | 0.15 | 0.45 |
Appendix D
Phonological Characteristics
Table D1.
Mean values for phonological characteristics per language, sorted by the intention of the inventors.
language | Sonind | sonindc | pctopensyll | pctvowels | pctobstr | pctobstrofc | pctgutt | pctguttvel | pctbvofv | pctnongerman | intention |
---|---|---|---|---|---|---|---|---|---|---|---|
Sindarin | 39.00 | 9.78 | 31.91 | 44.70 | 21.12 | 37.73 | 0.00 | 3.31 | 27.22 | 4.53 | pleasing |
Quenya | 40.52 | 12.51 | 51.85 | 47.28 | 16.22 | 30.77 | 0.00 | 1.39 | 17.17 | 2.72 | pleasing |
Orkish | 34.84 | 6.48 | 65.56 | 43.83 | 39.00 | 68.96 | 7.15 | 22.84 | 36.16 | 2.47 | harsh |
Khuzdul | 37.45 | 7.41 | 68.18 | 45.57 | 31.64 | 58.10 | 13.28 | 20.49 | 15.54 | 0.00 | harsh |
Klingon | 30.70 | 6.84 | 31.67 | 33.98 | 53.88 | 81.85 | 20.53 | 29.06 | 30.83 | 37.01 | harsh |
Dothraki | 39.42 | 9.27 | 49.89 | 41.51 | 32.44 | 55.5 | 5.79 | 11.78 | 12.38 | 5.43 | harsh |
Adûnaic | 48.29 | 9.98 | 70.63 | 53.84 | 25.34 | 55.14 | 1.11 | 4.37 | 24.31 | 1.08 | unknown |
Vulcan | 36.66 | 9.28 | 50.32 | 40.26 | 35.29 | 59.15 | 3.77 | 8.17 | 21.12 | 2.02 | unknown |
Na’vi | 36.35 | 9.30 | 63.33 | 42.28 | 32.61 | 56.94 | 3.13 | 14.85 | 35.12 | 12.76 | unknown |
Atlantean | 33.28 | 8.43 | 57.04 | 40.08 | 33.02 | 55.06 | 1.42 | 7.65 | 17.35 | 3.12 | unknown |
Kesh | 38.07 | 8.07 | 64.02 | 48.52 | 34.09 | 65.46 | 9.41 | 16.76 | 25.03 | 2.94 | unknown |
ʕuiʕuid | 45.51 | 11.55 | 77.66 | 52.00 | 27.96 | 59.44 | 3.14 | 15.69 | 36.66 | 9.34 | unknown |
This method of calculating the average sonority is described in more detail in Section 2.4 (see Table 3).
Strictly speaking, Black Speech is the language invented by Sauron and Orkish is derived from it with a less complex morphology but similar phonology (Tolkien, 2021, Appendix F).
Subsequently, we do not distinguish between Orkish and Neo-Orkish, and use only Orkish, unless we want to refer unambiguously to one of the two languages. The same applies to Khuzdul and Neo-Khuzdul, below.
The remarks on these allophones are not quite clear in the book. Therefore, we had to set likely rules by ourselves to specify the pronunciation of our Kesh examples.
The grammar can be found at https://doi.org/10.17605/OSF.IO/3DQAJ.
The stimuli can be found at https://doi.org/10.17605/OSF.IO/3DQAJ Stimuli.
The code for these analyses can be found here: https://doi.org/10.17605/OSF.IO/3DQAJ metrics.
In all, 84 participants carried out the guessing part of the experiment (see Section 2.2). They entered a specific language for only 12% of the 984 stimuli in the guessing part, of which 30% were correct guesses. This is excluding the participants who identified the languages correctly and did not change results presented here.
Translation: “One Ring to rule them all, One Ring to find them, One Ring to bring them all, and in the darkness bind them.”
Footnotes
Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethics and consent: The experiments run for this paper followed the standard ethics guidelines regarding consent and voluntary participation. All participants were informed about the experimental procedure and future data processing before providing their consent. Participation was voluntary and the participants could stop participating at any point.
ORCID iD: Christine Mooshammer
https://orcid.org/0000-0002-7836-1741
Supplementary material: The stimuli, data, R scripts and the grammar of ʕuiʕuid can be found in https://doi.org/10.17605/OSF.IO/3DQAJ
References
- Andrews S. (2010). Paul Frommer sounds off on Avatar language. https://www.campfirewriting.com/learn/interview-paul-frommer
- Anikin A., Aseyev N., Erben Johansson N. (2023). Do some languages sound more beautiful than others? Proceedings of the National Academy of Sciences, 120(17), Article e2218367120. 10.1073/pnas.2218367120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Annear L. (2020). Vowel category and meanings of size in Tolkien’s early lexicons. Journal of Tolkien Research, 9(2), Article 5. https://scholar.valpo.edu/journaloftolkienresearch/vol9/iss2/5/ [Google Scholar]
- Axén B. (2019). In the shadow of Elvish—The Black Speech and Orkish: Peter Jackson’s Films. https://zhaaburi.wordpress.com/peter-jacksons-movies/
- Bartoń K. (2022). Mumin: Multi-model inference [R package version 1.47.1]. https://cran.r-project.org/web/packages/MuMIn/index.html
- Bates D., Mächler M., Bolker B., Walker S. (2015). Fitting linear mixed-effects models using Ime4. Journal of Statistical Software, 67(1), 1–48. 10.18637/jss.v067.i01 [DOI] [Google Scholar]
- Beinhoff B. (2015). Why are alien languages inherently human? Foundation, 44(122), 5–19. [Google Scholar]
- Beinhoff B. (2023). Design intentions and actual perception of fictional languages: Quenya, Sindarin and Na’vi. In Noletto I., Norledge J., Stockwell P. (Eds.), Reading fictional languages. Edinburgh University Press. [Google Scholar]
- Bloomfield L. (1909). A semasiological differentiation in Germanic secondary ablaut. The University of Chicago. [Google Scholar]
- Boersma P., Weenink D. (2019). Praat: Doing phonetics by computer. http://www.praat.org/
- Bombien L., Winkelmann R., Scheffers M. (2020). Wrassp: An R wrapper to the ASSP library [R package version 0.1.9]. https://cran.r-project.org/web/packages/wrassp/index.html
- Brunner B. (2014). The sound of difference: Why we find some languages more beautiful than others. https://www.thesmartset.com/article03041401/
- Cain S., Conley T. (2006). Encyclopedia of fictional and fantastic languages. Greenwood Publishing Group. [Google Scholar]
- Comrie B., Haspelmath M., Bickel B. (2015). The Leipzig glossing rules: Conventions for interlinear terlinear morpheme-by-morpheme glosses. Department of Linguistics of the Max Planck Institute for Evolutionary Anthropology & the Department of Linguistics of the University of Leipzig. https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf [Google Scholar]
- Crystal D. (1995). Phonaesthetically speaking. English Today, 11(2), 8–12. [Google Scholar]
- Ćwiek A., Fuchs S., Draxler C., Asu E. L., Dediu D., Hiovain K., Kawahara S., Koutalidis S., Krifka M., Lippus P., Lupyan G., Oh G. E., Paul J., Petrone C., Ridouane R., Reiter S., Schümchen N., Szalontai Á., Ünal-Logacev Ö., Winter B. (2022). The bouba/kiki effect is robust across cultures and writing systems. Philosophical Transactions of the Royal Society B, 377(1841), Article 20200390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Saussure F. (1916). Cours de linguistique générale. Payot. [Google Scholar]
- Destruel M. (2016). Reality in fantasy: Linguistic analysis of fictional languages [Master’s Thesis, Boston College, Morrissey College of Arts and Sciences Graduate School; ]. http://hdl.handle.net/2345/bc-ir:107144 [Google Scholar]
- Dingemanse M., Schuerman W., Reinisch E., Tufvesson S., Mitterer H. (2016). What sound symbolism can and cannot do: Testing the iconicity of ideophones from five languages. Language, 92(2), 117–133. 10.1353/lan.2016.0034 [DOI] [Google Scholar]
- Draxler C. (2017). PercyConfigurator—Perception experiments as a service. In Proceedings of Interspeech 2017 (pp. 823–824). ISCA Archive. [Google Scholar]
- Dufter A., Reich U. (2003). Rhythmic differences within romance: Identifying French, Spanish, European and Brazilian Portuguese. In Solé M. J., Recasens D., Romero J. (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences (pp. 2781–2784). ICPhS Archive. [Google Scholar]
- Dwarrow T. S. (2017). The Dwarrow scholar. https://www.dwarrowscholar.com/home.html
- Elsen H. (2019). Lautsymbolik in phantastischen Sprachen. Wirkendes Wort, 69, 103–119. [Google Scholar]
- Flieger V. (2017). The Orcs and the others: Familiarity as estrangement in The Lord of the Rings. In Vaccaro C., Kisor Y. (Eds.), Tolkien and alterity. The new middle ages (pp. 205–222). Palgrave Macmillan. 10.1007/978-3-319-61018-4_10 [DOI] [Google Scholar]
- Fónagy I. (1991). Paralinguistic universals and preconceptual thinking in language. In Waugh L. R., Rudy S. (Eds.), New vistas in grammar: Invariance and variation (pp. 495–515). John Benjamins Publishing Company. [Google Scholar]
- Fónagy I. (1999). Why iconicity? In Nänny M., Fischer O. (Eds.), Form miming meaning. Iconicity in language and literature (pp. 3–36). John Benjamins Publishing Company. [Google Scholar]
- Fought J. G., Munroe R. L., Fought C. R., Good E. M. (2004). Sonority and climate in a world sample of languages: Findings and prospects. Cross-Cultural Research, 38(1), 27–51. 10.1177/1069397103259439 [DOI] [Google Scholar]
- Frommer P. (2009). Some highlights of Na’vi. https://languagelog.ldc.upenn.edu/nll/?p=1977.
- Gardner M. R., & The Vulcan Language Institute. (2004). The Vulcan language [unpublished]. http://surak.nu/vulcanlanguage.pdf
- Giles H., Bourhis R., Davies A. (1979). Prestige speech styles: The imposed norm and inherent value hypotheses. In: McCormack W., Wurm S. (Eds.), Language and Society (pp. 589–596). Mouton Publishers. [Google Scholar]
- Giles H., Niedzielski N. (1998). German sounds awful, but Italian is beautiful. In Bauer L., Trudgill P. (Eds.), Language myths (pp. 85–93). Pinguin. [Google Scholar]
- Gymnich M. (2005). Reconsidering the linguistics of Middle-earth: Invented languages and other linguistic features in JRR Tolkien’s The Lord of the Rings. In Honegger T. (Ed.), Reconsidering Tolkien (pp. 7–30). Walking Tree Publishers. [Google Scholar]
- Hilton N. H., Gooskens C., Schüppert A., Tang C. (2022). Is Swedish more beautiful than Danish? Matched guise investigations with unknown languages. Nordic Journal of Linguistics, 45(1), 30–48. [Google Scholar]
- Johannesson N.-L. (2007). Quenya, the Black Speech and the sonority scale. In Stenström A. (Ed.), Proceedings of the first international conference on J.R.R. Tolkien’s invented languages (pp. 14–21). The Arda Society. [Google Scholar]
- Judd C. M., Westfall J., Kenny D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54. https://doi.org/https://doi.org/10.1037/a0028347 [DOI] [PubMed] [Google Scholar]
- Kassambara A. (2022). Rstatix: Pipe-friendly framework for basic statistical tests [R package version 0.7.1]. https://CRAN.R-project.org/package=rstatix
- Kogan V. V., Reiterer S. M. (2021). Eros, beauty, and phon-aesthetic judgements of language sound. We like it flat and fast, but not melodious. Comparing phonetic and acoustic features of 16 European languages. Frontiers in Human Neuroscience, 15, 1–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Köhler F. (1947). Gestalt psychology: An introduction to new concepts in modern psychology. Liveright. [Google Scholar]
- Kohler K. (1990). German. Journal of the International Phonetic Association, 20(1), 48–50. 10.1017/S0025100300004084 [DOI] [Google Scholar]
- Langmaker. (2008). Atlantean metahistory. https://web.archive.org/web/20080707004131/http://www.langmaker.com/atlanteanmetahistory.htm
- Leemann A., Kolly M.-J., Nolan F. (2015). It’s not phonetic aesthetics that drives dialect preference: The case of Swiss German. In Proceedings of ICPhS 2015, ed. The Scottish Consortium for ICPhS, Glasgow. [Google Scholar]
- Le Guin U. K. (2016). Always coming home. Gollancz. [Google Scholar]
- Lockwood G., Dingemanse M. (2015). Iconicity in the lab: A review of behavioral, developmental, and neuroimaging research into sound-symbolism. Frontiers in Psychology, 6, 1–14. 10.3389/fpsyg.2015.01246 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maddieson I. (1984). Patterns of sounds. Cambridge University Press. [Google Scholar]
- Maddieson I. (2013). Consonant inventories. In Dryer M. S., Haspelmath M. (Eds.), The world atlas of language structures online. Max Planck Institute for Evolutionary Anthropology. https://wals.info/chapter/6 [Google Scholar]
- Malik-Moraleda S., Taliaferro M., Shannon S., Jhingan N., Swords S., Peterson D. J., Frommer P., Okrand M., Sams J., Cardwell R., Freeman C., Fedorenko E. (2023). Constructed languages are processed by the same brain mechanisms as natural languages. bioRxiv. 10.1101/2023.07.28.550667 [DOI]
- McLean B., Dunn M., Dingemanse M. (2023). Two measures are better than one: Combining iconicity ratings and guessing experiments for a more nuanced picture of iconicity in the lexicon. Language and Cognition. Advance online publication. 10.1017/langcog.2023.9 [DOI]
- Mooshammer C., Bobeck D., Hornecker H., Meinhard K., Olina O., Walch M. C., Xia Q. (2023). The phonaesthetics of constructed languages: Results from an online rating experiment. In Noletto I., Norledge J., Stockwell P. (Eds.), Reading fictional languages. Edinburgh University Press. [Google Scholar]
- Moreau M.-L., Thiam N., Harmegnies B., Huet K. (2014). Can listeners assess the sociocultural status of speakers who use a language they are unfamiliar with? A case study of Senegalese and European students listening to Wolof speakers. Language in Society, 43(3), 333–348. [Google Scholar]
- Nakagawa S., Schielzeth H. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2), 133–142. [Google Scholar]
- Okrand M. (1992). The Klingon dictionary (2nd ed.). Pocket Books. [Google Scholar]
- Okrand M. (1996). The Klingon way: A warrior’s guide. Pocket Books. [Google Scholar]
- Okrand M., Adams M., Hendriks-Hermans J., Kroon S. (2011). Wild and whirling words: The invention and use of Klingon. In Adams M. (Ed.), From Elvish to Klingon: Exploring invented languages (pp. 111–134). Oxford University Press. [Google Scholar]
- Okrent A. (2009). In the land of invented languages. Random House. [Google Scholar]
- Peterson D. J. (2015). The art of language invention: From horse-lords to Dark Elves to sand worms, the words behind world-building. Penguin. [Google Scholar]
- Podhorodecka J. (2007). Is lámatyáve a linguistic heresy? Iconicity in JRR Tolkien’s invented languages. In Fischer O., Ljungberg C., Tabakowska E. (Eds.), Insistent images (pp. 103–130). John Benjamins. [Google Scholar]
- Ramachandran V. S., Hubbard E. M. (2001). Synaesthesia: A window into perception, thought and language. Journal of Consciousness Studies, 8, 3–34. [Google Scholar]
- Rausch R. (2014). Sound symbolism in Elvish. In Arda Philology 4, Proceedings of the Fourth International Conference on J.R.R. Tolkien’s Invented Language (pp. 82–119). ArdaSociety. [Google Scholar]
- R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/ [Google Scholar]
- Reiterer S. M., Kogan V. V., Seither-Preisler A., Pesek G. (2020). Foreign language learning motivation: Phonetic chill or Latin lover effect? Does sound structure or social stereotyping drive FLL? Psychology of Learning and Motivation, 72, 165–205. [Google Scholar]
- Ryan K. N. (2014). Tolkien’s tongues: The phonetics and phonology of Tolkien’s Quenya language [B.A. thesis]. Department of Linguistics, Swarthmore College. [Google Scholar]
- Schäfer-Vincent K. (1983). Pitch period detection and chaining: Method and evaluation. Phonetica, 40(3), 177–202. [DOI] [PubMed] [Google Scholar]
- Shadlag K. (2006). Atlantean language institute. https://web.archive.org/web/20140323053118/http://www.freewebs.com/keran_shadlag/
- Skowrońska D. E. (2018). The true name of things: Invented languages in Ursula K. Le Guin’s fiction. Symbolae Eeurpaeae, 13, 37–48. [Google Scholar]
- Smith R. (2006). Fitting sense to sound: Linguistic aesthetics and phonosemantics in the work of J.R.R. Tolkien. Tolkien Studies, 3, 1–20. [Google Scholar]
- Smith R. (2007). Inside language: Linguistic and aesthetic theory in Tolkien. Walking Tree. [Google Scholar]
- Stanley J. (2003). Tongue of malevolence: A linguistic analysis of constructed fictional languages with emphasis on languages constructed for “the other” [Master’s thesis]. Duke University. [Google Scholar]
- Stockwell P. (2006). Invented language in literature. In Brown K. (Ed.), Encyclopedia of language & linguistics (2nd ed., pp. 3–11). Elsevier. [Google Scholar]
- Tikka P. (2007). The finnicization of Quenya. Arda Philology, 1, 1–13. [Google Scholar]
- Tolkien J. R. R. (1981). The letters of J. R. R. Tolkien (Carpenter H., Tolkien C., Eds.). Harper-Collins. [Google Scholar]
- Tolkien J. R. R. (1993). Morgoth’s ring (Tolkien C., Ed.; Vol. 10). HarperCollins. [Google Scholar]
- Tolkien J. R. R. (2002). Sauron defeated. The end of the third age: The history of The Lord of the Rings, part four & the Notion Club papers & the drowning of Anadûnê (Tolkien C., Ed.; Vol. 9). HarperCollins. [Google Scholar]
- Tolkien J. R. R. (2006). Early Elvish poetry and Pre-Fëanorian alphabets. In Gilson C., Smith A. R., Wynne P. H., Hostetter C. F., Welden B. (Eds.), Parma Eldalamberon. Parma Eldalamberon. [Google Scholar]
- Tolkien J. R. R. (2008). Tolkien on fairy-stories: Expanded edition, with commentary and notes (Flieger V., Anderson D., Eds.). HarperCollins. [Google Scholar]
- Tolkien J. R. R. (2016). A secret vice: Tolkien on invented languages (Fimi D., Higgins A., Eds.). HarperCollins. [Google Scholar]
- Tolkien J. R. R. (2021). The lord of the rings (One volume). HarperCollins. [Google Scholar]
- Trudgill P., Giles H. (1978). Sociolinguistics and linguistic value judgements: Correctness, adequacy and aesthetics. In Coppieters F., Goyvaerts D. (Eds.), Functional studies of language and literature (pp. 167–190). Story-Scientia. [Google Scholar]
- Van Bezooijen R. (2002). Aesthetic evaluation of Dutch. In Long D., Preston D. R. (Eds.), Handbook of perceptual dialectology (Vol. 2, pp. 13–30). John Benjamins. [Google Scholar]
- Van Hoey T., Thompson A. L., Do Y., Dingemanse M. (2023). Iconicity in ideophones: Guessing, memorizing, and reassessing. Cognitive Science, 47(4), Article e13268. 10.1111/cogs.13268 [DOI] [PubMed] [Google Scholar]
- Winkelmann R., Harrington J., Jänsch K. (2017). EMU-SDMS: Advanced speech database management and analysis in R. Computer Speech & Language, 45, 392–410. 10.1016/j.csl.2017.01.002 [DOI] [Google Scholar]
- Winter B., Perlman M. (2021). Size sound symbolism in the English lexicon. Glossa: A Journal of General Linguistics, 6(1), 79. 10.5334/gjgl.1646 [DOI] [Google Scholar]