EFFECTS OF TALKER GENDER ON DIALECT CATEGORIZATION

CYNTHIA G CLOPPER; BRIANNA CONREY; DAVID B PISONI

doi:10.1177/0261927X05275741

. Author manuscript; available in PMC: 2011 Mar 17.

Published in final edited form as: J Lang Soc Psychol. 2005 Jun 1;24(2):182–206. doi: 10.1177/0261927X05275741

EFFECTS OF TALKER GENDER ON DIALECT CATEGORIZATION

CYNTHIA G CLOPPER ¹, BRIANNA CONREY ¹, DAVID B PISONI ¹

PMCID: PMC3060037 NIHMSID: NIHMS277290 PMID: 21423866

Abstract

The identification of the gender of an unfamiliar talker is an easy and automatic process for naïve adult listeners. Sociolinguistic research has consistently revealed gender differences in the production of linguistic variables. Research on the perception of dialect variation, however, has been limited almost exclusively to male talkers. In the present study, naïve participants were asked to categorize unfamiliar talkers by dialect using sentence-length utterances under three presentation conditions: male talkers only, female talkers only, and a mixed gender condition. The results revealed no significant differences in categorization performance across the three presentation conditions. However, a clustering analysis of the listeners’ categorization errors revealed significant effects of talker gender on the underlying perceptual similarity spaces. The present findings suggest that naïve listeners are sensitive to gender differences in speech production and are able to use those differences to reliably categorize unfamiliar male and female talkers by dialect.

Keywords: dialect categorization, gender, regional dialect, speech perception, indexical properties

Speech perception involves not only processing of the linguistic message, but also processing and encoding talker-specific properties of the speech signal such as the age, gender, or regional dialect of the talker (Klatt, 1989). The identification of talker gender is a relatively easy task for adult listeners. Lass, Hughes, Bowyer, Waters, and Bourne (1976) reported that naïve adults could identify the gender of unfamiliar talkers based on single vowel utterances with near-ceiling performance. In addition, the listeners were 75% accurate in identifying the gender of the talker using whispered speech, suggesting that gender information is carried in the speech signal by phonological properties other than voicing and fundamental frequency.

More recently, Mullennix and Pisoni (1990) conducted a speeded classification task that provided evidence for the automaticity of talker gender identification. Participants were asked to attend to either the gender of the talker or the linguistic content of the utterance. Mullennix and Pisoni (1990) found that talker gender interfered with performance in the linguistic task more than the linguistic information interfered with the gender identification task, suggesting that gender-specific information is an integral component of the speech perception process.

Gender-specific information in the speech signal is not based exclusively on biological differences between men and women. In her review of the language attitude literature as it pertains to gender differences, Kramarae (1982) noted that although pitch differences between males and females are at least partly the result of biological differences, many other linguistic gender differences appear to be learned. In terms of phonological differences, women are often thought to produce clearer speech with greater modulation of pitch than men. In one recent study, Namy, Nygaard, and Sauerteig (2002) reported that women were more likely to phonologically accommodate their speech to that of an interlocutor than men. In addition, women accommodated more to male interlocutors than to female interlocutors, but men showed no difference in accommodation based on the gender of the interlocutor. These findings suggest that a talker’s gender is a critical component of his or her phonological system.

It is also well known that gender interacts with other social variables that affect phonology, including regional and ethnic dialects. Gender-correlated differences in the production of prestige forms and innovative forms of speech have been reported frequently in the sociolinguistics literature. In an extensive review, Labov (1990, 2001) summarized the observed production differences with the three principles shown below.

Women use more prestige forms than men, and, conversely, men use more nonstandard forms than women.
Women favor incoming prestige forms in changes from above, which are defined as involving forms associated with a high level of social consciousness.
Women also tend to lead in changes from below, which involve variation that has not become stereotyped or associated with particular social groups. However, in a minority of cases such as diphthong centralization on Martha’s Vineyard (Labov, 1963), men can lead in changes from below.

Labov (1990, 2001) also discussed the interaction of gender and social class. In general, the second highest status group, which he refers to as the lower middle class (1990) or the upper working class (2001), shows the greatest gender differences in speech production and the least frequent use of stigmatized forms. For instance, the crossover pattern in which the second highest status group uses the standard or conservative form more frequently in careful speech than the highest status group (e.g., Labov, 1966) is more prominent in women than men (Labov, 2001).

Although exceptions exist to these general observations and some of the concepts involved have been called into question (e.g., the definitions of “standard” and “nonstandard” and the methods of assigning social class to women; see Cheshire, 2002), these generalizations have been supported by a number of influential studies in English-speaking Western communities. In an early study, Trudgill (1974) found that women were more conservative than men in the use of nonstandard forms in Norwich, England. For example, men were more likely than women to use the nonstandard form/n/for the standard form/ή/. Also, working class speakers were more likely to use the nonstandard/n/than speakers from other social classes. Trudgill hypothesized that women are more status conscious than men and so are more likely to use socially prestigious, linguistically conservative forms and to avoid nonstandard forms like/n/that are typically associated with working class speech.

Following up on this earlier work, Eckert (1989) emphasized the interaction of gender with other socially constructed categories, such as socioeconomic class and social group. In addition, she hypothesized that because women often have less material capital than men, women rely more on phonological variation as “symbolic capital” to signal their group membership. In a study that documented the progress of the Northern Cities Chain Shift among high school students in a Detroit suburb who belonged to one of two social groups, the jocks or the burn-outs, Eckert found that both gender and group membership played a role in speech production. Specifically, the burnouts were further advanced than the jocks in the backing of/ε/and/Λ/, the newer changes in the shift that were associated more with Detroit urban speech and potentially with the somewhat subversive, counteradult values adopted by the burnouts. However, the girls in both groups were more advanced in the fronting of/æ/and/a/and the lowering and fronting of/ɔ/, all of which are older and more established changes in the shift that had already entered the phonological repertoire of most speakers in the community.

In another study, Milroy and Milroy (1993) discussed gender differentiation as it interacts with social class, but they emphasized the role of social network strength in the spread of linguistic change. An individual’s social network was defined by how many ties he or she had to others, in how many capacities he or she interacted with them (network multiplexity), and how many of them knew each other (network density). A strong social network is one that is highly dense and multiplex, and an individual’s network strength score increased with increases in network density and/or multiplexity. Milroy and Milroy hypothesized that linguistic innovators are individuals who know many people but have weak social networks. They found that women had weak network strength scores and also a low incidence of a particular nonstandard variant, intervocalic fricative deletion; in contrast, men had stronger networks and a higher incidence of the nonstandard variant. Similarly, in his data from Philadelphia, Labov (2001) found that linguistic innovators were most often upwardly mobile and locally well-respected women with many nonlocal ties.

The relationship between gender and linguistic variation in speech requires a great deal more exploration, but gender differentiation within regional dialects has clearly been observed (Eckert, 1989; Labov, 2001). Despite these well-documented differences in production between male and female speech, however, research on the perception of dialect variation has been limited almost exclusively to the study of male talkers. In one of the first studies of dialect perception, Preston (1993) asked naïve participants to listen to samples of speech from nine different talkers and then decide where each talker was from out of a set of nine cities located between Dothan, Alabama, and Saginaw, Michigan. All nine talkers in Preston’s study were male.

Purnell, Idsardi, and Baugh (1999) conducted a study of ethnic identification in the San Francisco area using a variant of the matched-guise technique. A single talker left answering machine messages for landlords inquiring about apartments for rent using one of three ethnic guises: Standard American English, African American Vernacular English, and Chicano English. The talker used in the Purnell et al. study was also male.

More recently, researchers in the United Kingdom, the Netherlands, and the United States have conducted forced-choice dialect categorization tasks using multiple talkers from multiple regional varieties of English and Dutch. Williams, Garrett, and Coupland (1999) used male talkers and listeners in their examination of dialect categorization in Wales. Van Bezooijen and Gooskens (1999) also used only male talkers in their studies of dialect categorization in the United Kingdom and the Netherlands. In their dialect categorization research in the United States, Clopper and Pisoni (2004b) used only male talkers as well. In one study that provides an exception to this trend, Van Bezooijen and Ytsma (1999) explored categorization in the Netherlands using a set of 24 female talkers.

Although the precise methodologies and stimulus materials across the dialect categorization tasks have varied considerably, performance in all cases was generally quite poor overall, although above chance. Williams et al. (1999) reported that Welsh adolescents could identify the dialect of other Welsh adolescents in an eight-alternative forced-choice categorization task with approximately 30% accuracy. Adults in the United Kingdom performed somewhat better, accurately categorizing 52% of the talkers by area in a forced-choice categorization task that included identification of the talkers’ country (e.g., England or Scotland), region (e.g., North England or Wales), and area (e.g., North Wales or South Wales; Van Bezooijen & Gooskens, 1999). Meanwhile, adults in the Netherlands accurately identified the province of origin of 40% of the male talkers and 35% of the female talkers (Van Bezooijen & Gooskens, 1999; Van Bezooijen & Ytsma, 1999) in a similar multistage categorization task. Finally, adult listeners in the United States also performed poorly. Clopper and Pisoni (2004b) found that listeners were only 31% correct in a six-alternative forced-choice dialect categorization task.

The similarity between the results reported for male talkers in the Netherlands by Van Bezooijen and Gooskens (1999) and those reported for female talkers by Van Bezooijen and Ytsma (1999) suggests that the gender of the talkers may not have a large effect on perceptual categorization. However, the gender differences in speech production summarized above indicate that an explicit comparison of perceptual performance across talker gender is warranted. The present study was designed to provide such a comparison using three sets of stimulus materials: male talkers only, female talkers only, and a mixed group of male and female talkers. The data from the male talkers have previously been reported by Clopper and Pisoni (2004b) and are summarized briefly here. The data from the female talker and mixed talker groups are reported below in Experiments 1 and 2, respectively.

Data obtained from all three presentation conditions were collected and analyzed using the same experimental methodology. Naïve participants were asked to listen to isolated English sentences and make judgments about where the talkers were from using a six-alternative forced-choice categorization task. The talkers represented six different regional varieties of American English: New England, North, North Midland, South Midland, South, and West. Each listener completed three blocks of trials containing different stimulus materials. In the first block, the listeners heard each talker reading the sentence, “She had your dark suit in greasy wash water all year.” In the second block, they heard each talker reading the sentence, “Don’t ask me to carry an oily rag like that.” Finally, in the third block, each talker read a different, novel sentence.

The data obtained in each experiment were scored for categorization accuracy and submitted to a hierarchical clustering analysis to determine the perceptual similarity spaces of the dialects (Corter, 1982; Nosofsky, 1985). The first column of Table 1 shows the percentage correct accuracy scores for the male talker condition for each of the three experimental blocks (Clopper & Pisoni, 2004b). Chance performance is 17% in a six-alternative forced-choice task. Therefore, although the listeners in this task performed above chance, their overall performance was still quite poor. Performance on Sentence 2 was significantly worse than performance on Sentence 1 and the novel sentences. Categorization performance was the same across the first and novel sentence conditions. Performance was also assessed separately for each dialect. Clopper and Pisoni (2004b) found that listeners were more accurate in categorizing talkers from New England and the South than any of the other four dialect regions. These two regional varieties are perhaps the most marked dialect regions in the United States in terms of both production (Krapp, 1925) and perception (Preston, 1993), so it is not surprising that these two dialects were also the easiest to identify for naïve listeners.

Table 1.

Mean Percentage Correct Categorization Scores for Each of the Three Experimental Blocks for the Male Talker Condition (Clopper & Pisoni, 2004b), the Female Talker Condition (Experiment 1), and the Mixed Talker Condition (Experiment 2)

	% Male Talkers	% Female Talkers	% Mixed Male and Female Talkers
Sentence 1	33 (5)	31 (7)	33 (8)
Sentence 2	28 (5)	28 (6)	28 (7)
Novel sentences	33 (7)	31 (8)	34 (8)

Open in a new tab

Note: Standard deviations are shown in parentheses.

Given the high error rate in the categorization task, the stimulus-response confusion matrices were expected to reveal interesting patterns that would reflect the perceptual similarity of the six regional dialects of American English. In particular, the confusion matrices from the male talker condition were submitted to a hierarchical clustering analysis that revealed three major perceptual dialect clusters for the first and novel sentences: New England; South and South Midland; North, North Midland, and West. For the second sentence, a slightly different configuration was found that also consisted of three clusters: New England and North; South and South Midland; North Midland and West (Clopper & Pisoni, 2004b). The solutions to the hierarchical clustering analysis carried out by Clopper and Pisoni (2004b) are depicted graphically in Figure 1. In these figures, perceptual dissimilarity is proportional to the vertical distance connecting any two nodes. That is, the dissimilarity of any two dialects is the sum of the lengths of the fewest vertical branches connecting them. The three broad dialect clusters obtained in the perceptual similarity analysis were consistent with earlier findings in the sociolinguistics literature on production variation that describe the three major dialects of American English as eastern, southern, and western (Krapp, 1925) or northern, southern, and western (Labov, 1998).

Given the findings reported by Van Bezooijen and her colleagues for Dutch varieties (Van Bezooijen & Gooskens, 1999; Van Bezooijen & Ytsma, 1999), we did not expect to find large differences in performance across the three talker gender conditions in our six-alternative categorization task. However, the differences in production between the male and female talkers that would be predicted based on the variationist literature might lead to significant differences in the underlying perceptual similarity spaces of the dialects for each talker gender condition. For example, we would expect the Northern women to show greater advancement in production of the Northern Cities Chain Shift variables than the northern men. This difference in production should be reflected in the perceptual similarity of the Northern and North Midland dialects: these two dialects should be perceptually more similar for male talkers than for female talkers. Similarly, the Southern dialects (South and South Midland) are generally considered to be less prestigious than the Northern and Western varieties of American English (Preston, 1993), so Southern and South Midland women might exhibit fewer stigmatized features of these dialects in their speech. This difference in production should again be reflected in the perceptual similarity spaces of listeners: Southern and South Midland women should be more similar to women from other regions than Southern and South Midland men are to men from the other dialect regions. These and other differences in production that result from gender differences in linguistic change and prestige form usage should result in different perceptual dialect clusters for male and female talkers.

EXPERIMENT 1

METHOD

Stimulus materials

Stimulus materials consisted of audio recordings of read sentences drawn from the TIMIT Acoustic-Phonetic Continuous Speech Corpus (Fisher, Doddington, & Goudie-Marshall, 1986; Zue, Seneff, & Glass, 1990). The TIMIT corpus includes audio recordings of talkers from eight different dialect regions of the United States. Each talker in the TIMIT corpus was recorded reading 10 sentences. Two of these sentences, the calibration sentences, were the same for each talker and included lexical items and phonetic contexts designed to elicit regional dialect features (Fisher et al., 1986; Zue et al., 1990). These two sentences are shown below. Of the remaining eight sentences for each talker in the TIMIT corpus, five were read by a total of seven talkers in the corpus, and three were read by only a single talker. That is, some of the novel sentences included in the TIMIT corpus were read by multiple talkers and some were not. Although some of the sentence materials were produced by more than one talker on the TIMIT corpus, the novel sentences selected for this experiment were all different for each of the talkers.

She had your dark suit in greasy wash water all year (Sentence 1).
Don’t ask me to carry an oily rag like that (Sentence 2).

Six of the dialect regions were of interest in this experiment: New England, North, North Midland, South Midland, South, and West. Eight female talkers were selected from each of these six dialect regions, for a total of 48 different talkers. All of these talkers were White females between the ages of 20 and 29 at the time the recordings were made and were chosen by two phonetically trained listeners (the first and second authors) as the best representatives of their respective regional dialects. Both of the calibration sentences were used from each talker in this experiment. In addition, a third novel sentence was chosen for each talker. These novel sentences were hand-selected to ensure that relevant linguistic features were present in the stimulus materials that the untrained listeners could use to accurately categorize the talkers. Each of the phonetically trained listeners selected one sentence for each talker independently. When the two trained listeners selected different sentences for a given talker, the final sentence selection was made after additional listening to the materials. The novel sentences used in the experiment were, therefore, judged by the phonetically trained listeners to contain lexical items and/or phonetic contexts associated with regional variation documented in the sociolinguistics literature. Above-chance performance by the listeners validates the selections made by the two phonetically trained listeners. None of the novel sentences was ever repeated more than once during the course of the experiment. Each sentence was saved in a separate digital sound file, and all of the sound files were leveled to 55 dB using Level16 (Tice & Carrell, 1998).

Listeners

The listeners in this study were 35 Indiana University undergraduates, all of whom received partial credit in an introductory psychology course for participating. All listeners whose data were used in the final analysis were monolingual native speakers of American English with no history of a hearing or speech disorder. In addition, all listeners performed above chance on at least one of the three blocks of the experiment. Data from two bilingual listeners were discarded as were data from three listeners who performed statistically at chance on all three blocks of the experiment.¹ Thus, the final analysis included data from 30 listeners, 5 males and 25 females. None of these listeners had participated in the male talkers only experiment (Clopper & Pisoni, 2004b). The listeners represented a range of regional dialects, but approximately half (17 of 30) had lived only in Indiana.

Procedure

The experimental procedures used for the dialect categorization task were identical to those used by Clopper and Pisoni (2004b). Listeners were seated at personal computers equipped with headphones and a mouse. Each of the six response alternatives, corresponding to the six dialect regions, was represented by a partial map of the United States, including state boundaries. The maps were arranged on the screen so that they appeared in approximately the correct overall positions but were spatially separated so as to avoid the introduction of response error. The response alternatives are shown in Figure 2. The listeners were given the opportunity to familiarize themselves with the response alternatives and the presentation format prior to beginning the experiment.

The experiment itself consisted of three blocks of test trials that were completed by all listeners in the same order. In the first block, the listeners heard all 48 female talkers read the first calibration sentence once. The presentation order of the talkers was random and was unrelated to dialect region. After the presentation of each sentence, the listeners were asked to indicate which region they thought the talker was from by clicking on that region with the mouse. No feedback was given as to the accuracy of the listener responses. The second block was similar to the first, except that the listeners heard the second calibration sentence. The third block was the same as the first two, but the novel sentences were presented. Throughout the experiment, the sentences were presented at an average signal level of 70 dB SPL over Beyerdynamic DT100 headphones.

RESULTS AND DISCUSSION

Perceptual categorization

Overall categorization performance on the female talkers was consistent with the previous research on male talkers reported by Clopper and Pisoni (2004b). As shown in the second column of Table 1, the listeners were able to correctly categorize 31%, 28%, and 31% of the female talkers in the first, second, and third experimental blocks, respectively.

A repeated-measures ANOVA on talker dialect (New England, North, North Midland, South Midland, South, or West) and experimental block (first, second, or novel sentences) for the female talker condition revealed a significant effect of talker dialect, F(5, 174) = 17.35, p < .001, and a significant Dialect × Block interaction, F(10, 174) = 6.33, p < .001. The main effect of experimental block was not significant. Posthoc Tukey tests on talker dialect revealed better overall performance on New England and Southern talkers than talkers from the other four regions (all ps < .001). Performance on North Midland talkers was also significantly better than performance on the Western talkers (p < .001). None of the other pairwise comparisons were significant.

As shown in Figure 3, the Dialect × Block interaction is the result of increased performance on Southern talkers in the novel sentence block and decreased performance on the New England talkers in the second block of trials. These results are consistent with the earlier findings reported by Clopper and Pisoni (2004b) for male talkers, which also revealed better categorization performance for the New England and Southern talkers and an effect of experimental block for those same groups of talkers.

Percentage Correct Categorization Performance in the Female Talker Condition for Each Talker Dialect Region in Each of the Three Experimental Blocks.

A repeated-measures ANOVA on listener gender (male or female) and experimental block (first, second, or novel sentences) revealed no significant main effects or interactions. Categorization performance was not significantly affected by the gender of the listeners.

Perceptual similarity

Although overall categorization performance was quite poor, a closer examination of the listeners’ responses in the female talker condition revealed that listeners were not responding randomly. Their errors were structured based on the perceptual similarity of the different dialects. To assess the listeners’ perceptual con-fusions, one stimulus-response matrix was constructed for each experimental block, collapsed across all of the listeners. These matrices were then submitted to the Similarity Choice Model (SCM; Nosofsky, 1985) to extract similarity matrices for use in a hierarchical clustering analysis of perceptual similarity. Two SCM analyses were computed across the three error matrices for the female talker condition. In the first analysis, the similarity parameters were allowed to vary freely across each of the three matrices. In the second, restricted analysis, the similarity parameters were held constant across all three matrices. A comparison of the model fit for the two solutions (unrestricted and restricted by similarity) revealed a significantly better fit for the unrestricted model than the restricted model. This difference in model fit indicates that the different perceptual similarity models in the unrestricted analysis are not due to chance and that different models are necessary to account for the confusion data for each of the three sentence conditions (Sentence 1, Sentence 2, and novel sentences). Clopper and Pisoni (2004b) also found that different perceptual similarity models were necessary to account for the data from the three different experimental blocks in the male talker condition.

The similarity matrices resulting from the three individual SCM analyses were then submitted to a hierarchical clustering scheme, ADDTREE (Corter, 1982). The ADDTREE analysis computed a hierarchical structure based on the similarity input that is represented graphically in Figure 4 for the female talkers for each of the three sentence blocks. As in Figure 1, perceptual dissimilarity is indicated by the vertical lengths of the branches connecting any two dialects.

Clustering Analysis Results for the Female Talker Condition for Sentence 1, Sentence 2, and Novel Sentences.

The similarity structure of the dialects for the female talkers is similar overall to the results found for the male talkers (Clopper & Pisoni, 2004b). For the first sentence, the listeners perceived three major dialect clusters: New England; South and South Midland; North, North Midland, and West. For the second and novel sentences, the listeners also perceived three major clusters with a slightly different composition: New England and North; South and South Midland; North Midland and West.

Comparison to the male talker condition

As discussed above, performance in the female talker condition was qualitatively similar to performance in the male talker condition. To quantify the cross-condition comparisons, we conducted a repeated measures ANOVA on talker dialect, experimental block, and talker gender (male or female). The results of the ANOVA revealed a significant main effect of talker dialect, F(5, 276) = 29.13, p < .001, a significant main effect of experimental block, F(2, 552) = 5.40, p < .01, and a significant Dialect × Block interaction, F(10, 276) = 9.39, p < .001. In confirmation of our qualitative assessment of the consistency of the categorization results across gender conditions, neither the main effect of gender nor any of the interactions involving talker gender were significant.

A comparison of the male and female talker conditions for the perceptual similarity analysis did, however, reveal significant differences between the two conditions for each of the three experimental blocks. As in the analysis above for the sentences within the female talker condition, two SCM analyses were conducted using data from the male talker condition and the female talker condition. The model fit of the restricted model, in which similarity parameters were held constant across the two talker conditions, was significantly worse than the model fit for the unrestricted model, in which the similarity parameters varied freely across the two conditions. This difference in model fit suggests that the differences between talker conditions are not due to chance, but reflect an underlying difference in perceptual spaces for male and female talkers.

A visual inspection of Figures 1 and 4 provides some insight into these differences. For Sentence 1 and Sentence 2, the perceptual dialect clustering solutions are virtually identical in structure across the two gender conditions and the difference between the male and female talkers can be attributed to differences in the lengths of the individual branches. In perceptual terms, these differences in branch lengths reflect differences in dialect discriminability. For the novel sentences, the differences between the two gender conditions are more apparent. The Northern male talkers cluster with the North Midland and Western talkers, whereas the Northern female talkers cluster with the New England talkers. Despite the overall similarity in categorization performance as measured in terms of accuracy, significant differences in the perception of male and female talkers were revealed in the similarity space analysis. Experiment 2 examined the perception of dialect variation of male and female talkers more directly using a single set of listeners and a mixed group of talkers.