Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2009 Jun 24;106(26):10587–10592. doi: 10.1073/pnas.0903616106

Universals and cultural variation in turn-taking in conversation

Tanya Stivers a,1, N J Enfield a, Penelope Brown a, Christina Englert b, Makoto Hayashi c, Trine Heinemann d, Gertie Hoymann a, Federico Rossano a, Jan Peter de Ruiter a,e, Kyung-Eun Yoon f, Stephen C Levinson a
PMCID: PMC2705608  PMID: 19553212

Abstract

Informal verbal interaction is the core matrix for human social life. A mechanism for coordinating this basic mode of interaction is a system of turn-taking that regulates who is to speak and when. Yet relatively little is known about how this system varies across cultures. The anthropological literature reports significant cultural differences in the timing of turn-taking in ordinary conversation. We test these claims and show that in fact there are striking universals in the underlying pattern of response latency in conversation. Using a worldwide sample of 10 languages drawn from traditional indigenous communities to major world languages, we show that all of the languages tested provide clear evidence for a general avoidance of overlapping talk and a minimization of silence between conversational turns. In addition, all of the languages show the same factors explaining within-language variation in speed of response. We do, however, find differences across the languages in the average gap between turns, within a range of 250 ms from the cross-language mean. We believe that a natural sensitivity to these tempo differences leads to a subjective perception of dramatic or even fundamental differences as offered in ethnographic reports of conversational style. Our empirical evidence suggests robust human universals in this domain, where local variations are quantitative only, pointing to a single shared infrastructure for language use with likely ethological foundations.

Keywords: cooperation, response speed, social interaction


Crucial to understanding the nature and origins of human language, perhaps our most distinctive trait, is understanding the social-interactional matrix in which it is used. Informal conversation is where language is learned and where most of the business of social life is conducted. A fundamental part of the infrastructure for conversation is turn-taking, or the apportioning of who is to speak next and when (1). Previous research on turn-taking has examined cues used in recognizing opportunities for turn transition (14), the time course of a turn in an exchange (5), and the timing of turn transitions (1, 610). In English conversation speakers do not wait for pauses to begin their turn but avoid gaps and overlaps. To achieve this they use grammar, prosody, and pragmatics to project when they can start a next turn, suggesting that turn-taking is specifically organized to achieve this close timing. Here, we consider whether this organization varies across human cultures or is reflective of a universal system of rules for turn-taking in conversation. To our knowledge, no previous study has set out to test the robustness of a turn-taking system for informal interaction across the diversity of human cultures.

In the anthropological literature there are frequent claims that cultures differ radically in the timing of conversational turn-taking, and thus that the findings for English are culture-specific. Nordic cultures, for example, are said to relish long delays between one turn and the next. As the report goes, “Two brothers of Häme (Finland) were on their way to work in the morning. One says, ‘It is here that I lost my knife’. Coming back home in the evening, the other asks, ‘Your knife, did you say?”’ (11). Or receiving visitors in the North of Sweden: “We would offer coffee. After several minutes of silence the offer would be accepted. We would tentatively ask a question. More silence, then a ‘yes’ or a ‘no”’ (12). Compare this preference for silence between turns with the reported “fast rate of turn-taking” and “preference for simultaneous speech” in New York Jewish conversation (13) or the “anarchic” conversation of an Antiguan village, in which there is said to be “no regular requirement for 2 or more voices not to be going on at the same time” (12). Although there are many such claims in the anthropological literature of cultures where substantial overlap is the norm (1416) or where long silences are said to be the rule (11, 12, 17), no broad-ranging, quantitative comparison has been made. These claims suggest that there are culturally variable turn-taking systems.

In contrast to these claims of diversity, there are arguments in favor of a universal system for turn-taking, that, as in English, follows a norm of “minimal-gap minimal-overlap” (18). First, there is a functional basis for turns to be immediately adjacent (rather than overlapping or overly separated): a timely response makes clear its link to another speaker's prior utterance (19), displaying that it is directly contingent on that utterance (20), and showing how the prior utterance was understood, allowing rapid correction if necessary (1, 21, 22). Second, there is evidence for a human ethological basis for adjacent sequences of communicative action and response, for example in very early “proto-conversation” between newborns and caregivers (2326). Systems in which turn transitions occur with minimal delay or overlap have been described for several languages (1, 8, 27, 28), but no systematic cross-linguistic comparison has been undertaken.

Here, we test these opposing hypotheses: (i) a universal system hypothesis, by which turn-taking is a universal system with minimal cultural variability, and (ii) a cultural variability hypothesis, by which turn-taking is language and culture dependent. The universal system hypothesis predicts a unimodal distribution of turn transitions with most transitions occurring ≈0 in all languages, whereas the cultural variability hypothesis predicts that overlap is more common in some languages and gaps more common in others.

If a community of speakers shows a highly regular target for the timing of turn transition, deviations will come to have a natural communicative significance (e.g., delays implying problems with the prior utterance), so giving rise to implicit norms of timely response that will be maintained to avoid such added implications (29). Research on questions in English conversation has shown that speakers display inhibition in producing responses that in some way fail to conform with the terms of the question or with the questioner's agenda: thus, responses are often delayed by up to 1 s if, for example, they do not answer the question (e.g., I don't know or I can't remember) (30, 31) or if they give a response that runs against the bias of the question (e.g., A: Is that your car? B: No) (32, 33).

Two further explanations for variation in turn transition speed are associated with nonverbal behavior such as head movements (e.g., nodding) and gaze. Although the rules for turn-taking may discourage overlap in the vocal channel, they may nevertheless leave other channels exempt. If nonverbal signals are viewed as less intrusive upon speech, they may come earlier than purely verbal responses. Additionally, if questioners fix their eye gaze on their addressees, this may be expected to elicit faster responses. Research on conversation in European languages suggests that a speaker's gaze toward a listener may increase the pressure to respond and to respond quickly: eye gaze does this by indicating who is addressed (1), by providing early possible cues that the speaker's turn is now coming to an end (4, 6, 34) and signaling the speaker's heightened expectancy for a response (35). However, gaze behavior may show substantial cultural variation (36).

With respect to these 4 accounts for delayed turn transition (nonanswering responses, disconfirmations, vocal-only responses, and nongazing questions), the 2 hypotheses make different predictions. The universal system hypothesis predicts that the languages will all show the same pattern of slower turn transitions when these factors are present. By contrast, the cultural variability hypothesis predicts that delayed turn transition will be explained by different factors in different languages and that the 4 factors just mentioned are unlikely to account for variation in the same way cross-linguistically.

To test these competing hypotheses, we compared data from video recordings of informal natural conversation in 10 languages from 5 continents, e.g., from Southeast Asia, Mexico, Namibia, and Papua New Guinea (see Table S1). The languages vary fundamentally in type (e.g., in word order, sound structure, grammatical options) and are drawn from cultures of quite different kinds (from hunter–gatherer groups to peasant societies to large-scale postindustrial nations). To achieve a natural control over the discourse environment to be compared, we took advantage of a universal context for turn transition, namely that between questions and their responses. For optimal comparability we restricted the comparison to polar questions (questions that expect a yes or no answer). These are the most common type of questions in 9 of the 10 languages (67% of total questions in our 10-language sample were of this type), and they are also logically the simplest type: unlike responses to WH- questions (see Table S2), the desired response to a polar question comes from a small, closed set, usually yes or no. Although not all languages have precise equivalents of English yes and no, they all do have ways of asking polar questions and ways of conveying the basic functions of yes and no. For example, yes can be conveyed by repeating the key information in the question [e.g., Q: Is John going?, A: He's going (= yes)] or the use of nonstandard expressions like uh huh or yep. To determine whether question–response sequences are representative of turn-taking in general, we examined a corpus of Dutch conversation (8) for timing across all types of turns and responses and found no difference between response times after questions and nonquestions (see Fig. S1). This suggests that the use of question–answer sequences is a reasonable proxy for turn-taking more generally.

Results

Distribution of Turn Transitions.

The temporal relation between a turn and its response we will call the response offset, measured in milliseconds, when there is a gap we have a positive offset, when there is an overlap we have a negative offset. As Fig. 1 shows, we find that the response timings for each language, although slightly skewed to the right, have a unimodal distribution with a mode offset for each language between 0 and +200 ms, and an overall mode of 0 ms (see Fig. 1 and Table S3). The medians are also quite uniform, ranging from 0 ms (English, Japanese, Tzeltal, and Yélî-Dnye) to +300 ms (Danish, ‡Ākhoe Hai‖om, Lao) (overall cross-linguistic median +100 ms).

Fig. 1.

Fig. 1.

The distribution of turn transitions for each language in the 10 sample languages. All distributions are unimodal with the highest number of transitions occurring between 0 and 200 ms. The percentage of turn transitions is shown on the y axis, and milliseconds of turn offset are shown on the x axis.

The means display somewhat more variation, as shown in Fig. 2. Danish has the slowest response time on average (+469 ms) and Japanese has the fastest (+7 ms). The mean response offset for the full dataset is +208 ms, and the language-specific means fall within ≈250 ms either side of this cross-language mean, approximately the length of time it takes to produce a single English syllable (37).

Fig. 2.

Fig. 2.

The mean time (in ms) of turn transitions for each language (±1 SD) in the 10 sample languages shows that speakers of all languages have an average offset time that is within 500 ms. However, there is a continuum of faster to slower averages across the sample. Milliseconds are shown on the x axis. Languages are arrayed along the y axis. Da, Danish; ‡Ā, ‡Ākhoe Hai‖om; La, Lao; It, Italian; En, English; Ko, Korean; Du, Dutch; Yé, Yélî-Dnye; Tz, Tzeltal; Ja, Japanese.

The Implications of Turn Delay.

Answering vs. not.

Speakers of all of the languages provide answers significantly faster than nonanswer responses to questions (Fig. 3). In all of the languages we also found a greater proportion of answers than nonanswer responses (ranging from 64% of all responses in Korean to 87% in Dutch and Yélî-Dnye) (see Table S4).

Fig. 3.

Fig. 3.

The mean time of turn transition for responses coded as answers versus responses coded as nonanswer responses in each of the languages. Speakers of all languages produced answers (gray) faster, on average, than they produced nonanswer responses (black). *, P ≤ 0.05; **, P ≤ 0.01; ***, P ≤ 0.001. Milliseconds are shown on the x axis. Languages are arrayed along the y axis. Da, Danish; ‡Ā, ‡Ākhoe Hai‖om; La, Lao; It, Italian; En, English; Ko, Korean; Du, Dutch; Yé, Yélî-Dnye; Tz, Tzeltal; Ja, Japanese.

Confirming vs. disconfirming.

Within the set of answers, those that are confirmations are delivered faster than disconfirmations in all languages, between 100 and 500 ms faster on average (see Fig. 4). This difference reaches significance in 7/10 languages. In all of the languages, we also found a greater proportion of confirmations than disconfirmations (ranging from 70% of all answers in Danish to 89% of all answers in Yélî-Dnye; see Table S4). This advantage for affirmation also holds, incidentally, even if the affirming response is negative in form (as in “You're not coming?” and “No, I'm not”), showing that it is not simply a side-effect of the greater processing costs of negative responses (38) (confirmations using no are not significantly slower than confirmations using yes; 90 vs. 36 ms; t[693] = −1.1).

Fig. 4.

Fig. 4.

The mean time of turn transition for responses coded as confirmations versus responses coded as disconfirmations in each of the languages. Speakers of all languages produced confirmations (gray) faster, on average, than they produced disconfirmations (black). *, P ≤ 0.05; **, P ≤ 0.01; ***, P ≤ 0.001. Milliseconds are shown on the x axis. Languages are arrayed along the y axis. Da, Danish; ‡Ā, ‡Ākhoe Hai‖om; La, Lao; It, Italian; En, English; Ko, Korean; Du, Dutch; Yé, Yélî-Dnye; Tz, Tzeltal; Ja, Japanese.

The Implications of Nonverbal Channels.

Visible responses vs. vocal-only responses.

Visible responses were most commonly head nods, but we also found shrugs and head shakes, and in some languages like Yélî-Dnye conventionalized extended blinks and eyebrow flashes in response to questions. When visible responses occurred in response to a question, they were faster than speech in every language (see Fig. 5). This reached significance in 7/10 of the languages even though there was substantial variation in how frequently visible responses were included in a response (from 21% of responses including a visible component in ‡Ākhoe Hai‖om to 60% in Italian) (see Table S4).

Fig. 5.

Fig. 5.

The mean time of turn transition for responses coded as including a visible response versus responses coded as vocal only in each of the languages. Speakers of all languages produced responses with a visible component (gray) faster, on average, than they produced vocal only responses (black). *, P ≤ 0.05; **, P ≤ 0.01; ***, P ≤ 0.001. Milliseconds are shown on the x axis. Languages are arrayed along the y axis. Da, Danish; ‡Ā, ‡Ākhoe Hai‖om; La, Lao; It, Italian; En, English; Ko, Korean; Du, Dutch; Yé, Yélî-Dnye; Tz, Tzeltal; Ja, Japanese.

Questioner gaze vs. no gaze.

We found in 9 of the 10 languages that responses were delivered earlier if the speaker was looking at the recipient while the question was asked (Fig. 6). The differences reach statistical significance in only 5 languages. That Danish shows the opposite timing trend, although nonsignificant, combined with known differences in reliance on interactional gaze in different languages, suggests that gaze may be more culturally variable than other behaviors (36). This is also supported by the range of frequencies of gaze to addressee (from 21% in ‡Ākhoe Hai‖om to 88% in Japanese) (see Table S4). This is incidentally not the expectation in the literature, where addressee gaze rather than speaker gaze has been argued to be the norm (4, 6, 39).

Fig. 6.

Fig. 6.

The mean time of turn transition for questions coded as with speaker gaze versus questions coded as without speaker gaze in each of the languages. Speakers of 9/10 languages produced responses to questions with speaker gaze (gray) faster, on average, than they produced responses to questions without speaker gaze (black). *, P ≤ 0.05; **, P ≤ 0.01; ***, P ≤ 0.001. Milliseconds are shown on the x axis. Languages are arrayed along the y axis. Da, Danish; ‡Ā, ‡Ākhoe Hai‖om; La, Lao; It, Italian; En, English; Ko, Korean; Du, Dutch; Yé, Yélî-Dnye; Tz, Tzeltal; Ja, Japanese.

Multivariate Results.

The results so far show broadly similar patterns of response timing across the languages, with the same 4 factors each independently accounting for faster or slower than average responses Multivariate analysis confirms that these 4 are significant predictors of the speed of response in turn transition (see Table 1). Nonanswer responses are significantly slower than answer responses (positive value indicates longer turn transition time). Confirmation responses are faster than disconfirmation responses (negative value indicates shorter turn transition time); visible responses are faster than responses without a visible component; and questions delivered with questioner gaze are responded to more quickly than questions without questioner gaze. Information requests are slower than questions with other functions such as those initiating repair. These factors are significant predictors even when considered together. The model also shows that the conversation in which the question occurs and the language being spoken both further contribute to the variation we observed (see Fig. 2). However, because language spoken and source conversation were treated as different levels the 4 predictor variables are shown to be language-independent predictors.

Table 1.

Mixed-level multiple linear regression model predicting response time

Level 1 variables Estimate 95% CI
Response variables
Nonanswer response 131.78*** 59.34, 204.23
Confirmation −206.87*** −268.61, −145.12
Visible response component −86.93*** −136.76, −37.10
Question variables
Information request only 129.38*** 79.30, 179.46
Questioner gaze −69.28** −123.48, −15.08
Context variables
Level 2: Variance at language level 19555.05* 7342.23, 57304.20
Level 3: Variance at interaction level 14091.24*** 7715.57, 25735.37

The mulitvariate model shows that nonanswer responses are slower than answer responses and responses to information questions are slower than responses to other sorts of questions. Confirmations, responses with a visible component, and responses to questions that are delivered with speaker gaze are shown to be faster than disconfirmations, vocal responses and other sorts of questions, respectively. Language (i.e., the language being spoken) and conversation (i.e., the conversation from which a data point was taken) were treated as levels and thus the results are language independent.

*, P ≤ 0.05;

**, P ≤ 0.01;

***, P ≤ 0.001.

Discussion

Our results provide substantial support for the universal system hypothesis. The findings suggest a strong universal basis for turn-taking behavior, in that all languages show a similar distribution of response offsets (unimodal peak of response within 200 ms of the end of the question). The distribution of response offsets in all languages reflects a target of minimal overlap and minimal gap between turns. These results also show that the same set of explanations for a delayed response apply across languages.

Amid a strong universal pattern, we do see measurable cultural differences. However, the range that we show, mean offset of next turn in each language departing no more than a quarter-second from the overall mean, is not of the kind that would imply fundamentally different types of turn-taking systems in the different languages, as the cultural variability hypothesis would suggest.

Language structure does not explain the variance we observe. Languages that mark questions using a sentence-final marker might plausibly have been associated with slower responses because the fact that the utterance is a question may not be evident until the very end of the turn (28). However, Japanese, Korean, and Lao all use sentence-final marking for questions, yet they do not cluster together within the cross-language range of mean turn offsets (Fig. 2). A converse prediction, that languages like Danish, Dutch, and English, which tend to mark questions at the beginning of a turn, would allow faster responses, also turns out not to hold up. These 3 languages similarly do not cluster together (Fig. 2). Finally, note that this failure of Dutch, English, and Danish to cluster within the cross-language range of mean turn offsets is also evidence that linguistic and cultural kinship (in this case, West Germanic) does not predict interactional tempo.

We suggest that the differences involve a different cultural “calibration” of delay, thus constituting minor variation in the local implementation of a universal underlying turn-taking system, in which speakers aim to minimize the perceived gap before producing a following turn at talk. This target for ideal turn transition remains in a narrow window within each language, with each of 4 factors predisposing a response to be slower (or faster in the case of gaze) than the mean and having similar effects for all of the languages. These differences could either be a because of a specific cultural interactional pace or follow from more general differences in the overall tempo of social life (40). This would mean that speakers of all languages aim at minimizing significant delays relative to the specific rhythm of that language in conversation (e.g., ref. 41), a perspective that is supported by existing studies of some non Indo-European languages (27, 28, 42). To address this hypothesis we coded the offset of our responses for whether or not, when a relative subjective measure of the conversation's rhythm was taken into account, responses were coded as late versus on time. Mean response times for subjectively on-time responses are much longer in Danish and Lao (203 and 202 ms, respectively) than in Japanese and Tzeltal (36 and 83 ms, respectively) and comparing the 3 languages with longest response offsets to all others, the difference is significant [t(847) = −10.97, P < 0.001]. Thus, a silence of 200 ms, judged as a delay in most languages, was still considered on time. Such a silence is thus not phenomenologically salient within a speech community (but may be to an outside observer). In short, what constitutes a subjectively notable delay involves greater absolute duration in some languages than in others. This is consistent with the presence of a universal, stable system of turn taking avoiding overlap and minimizing gaps, but where there are different local metrics for what counts as a delay in response (18).

The variation we found between mean response times in different languages does not coincide wholly with the ethnographic expectations reported in the literature (14, 17). On the basis of these reports, Italian speakers should be more tolerant of overlap, but we found a mean offset of +310 ms, indicating that they in fact tend to leave a slightly longer than average gap before producing a next turn. And only 17% of all responses overlap, not at all an unusual proportion. Similarly, Japanese speakers are said to leave substantial gaps of silence before responding, but our findings show that Japanese speakers are, on average, the earliest to respond of all of the languages in our sample. Danish speakers were more consistent with ethnographic reports for Nordic languages, showing the longest mean response time in our sample. However, note that the mode was still quite small (100 ms), suggesting that here too speakers target minimizing gaps and overlaps in response offset time. To put the Danish case in context, recall that the mean offset in Danish was less than a half-second in total, a quarter-second deviation from the cross-linguistic average, and thus far from the lengthy pauses measured in minutes or even hours suggested by the ethnographic reports mentioned above.

Conclusion

We have shown strong parallels in turn-taking behavior across 10 languages of varied type, geographical location, and cultural setting. All of the languages show on average a small positive offset in response time, i.e., responses tend to be neither in overlap nor delayed by more than a half-second. The factors that predict whether a response will be faster or slower within each language are identical across the languages. These results offer systematic cross-linguistic support for the view that turn-taking in informal conversation is universally organized so as to minimize gap and overlap, and that consequently, there is a universal semiotics of delayed response.

How then to account for the ethnographic reports of diversity in this domain and the phenomenology of significant cross-cultural differences in response timing? Our data suggest that the regimentation of tempo within a culture is tight, and we come to expect a particular interactional metabolism as it were, slight departures from which have the associated contexts and interactional significance we have established above. Speakers become hypersensitive to perturbations in timing of responses, measured in 100 ms or less. This sensitivity to subtle variation may be responsible for the subjective impression by outsiders of “huge silences” in the case of Nordic languages (insiders, of course, will be calibrated to a local norm, as shown above by different subjective measures of what counts as delay). The actual difference between the 10-language norm and the average turn transition in Danish is confined to the time it takes to utter a single syllable. Our findings are in line with other close examinations of ethnographic outliers; see ref. 27 on the alleged Antiguan preference for overlap (12).

Abstracting from this fine-tuning of interactional tempo across cultures, our results point to robust universals in this domain. Strong universals of this kind seem to be much harder to find in the grammatical structure of languages than in the interactional systems that underlie their use (43). Our results argue for an interactional foundation for language that is relatively stable and relatively separable from the specific languages and cultural practices that instantiate it (23). Understanding this will be crucial for understanding the origin of language and the foundations of social life, because it is out of primordial interaction that languages and cultures are ultimately built.

Data and Methods

Data.

All contributors collected videotaped interactions of maximally informal, spontaneous, naturally occurring conversations, each with 2–6 consenting participants. We confine ourselves here to informal conversation, in which turn-taking is essentially self-organizing. Other procedures hold in highly structured institutional interaction (e.g., courts of law, church services, news interviews), which are often subject to explicit rules for who may speak and when. Participants were often engaged in additional activities (e.g., eating, drinking, or stringing beads). As long as the task was not determining the direction or structure of the conversation this was considered acceptable. Each contributor identified 350 consecutive questions across 5–17 separate interactions (101 conversations in the total dataset). No interaction accounts for >3% of the dataset and very few individuals participated in >1 conversation. Both of these features minimized the influence of any 1 individual or interaction on the overall pattern.

Each question and response, where one occurred, was coded for its form and function. In coding question–response sequences we drew on conversation analytic research on social interaction (1). This study uses an applied version of this method to test findings comparatively. In this study, only functional yes–no questions were included. Codes that are relevant to this study are described in Table S2. Numerical overviews of the data coding by language are provided in Table S4.

Response time was defined as the time elapsed between the end of the question turn and the beginning of the response turn. The response turn was considered to begin if a vocal or gestural response was initiated. Absolute response time was coded auditorily and then the time from the end of the question to the beginning of the response was measured instrumentally by using annotation software ELAN (www.lat-mpi.eu/tools/elan) in 10-ms increments rounded to 100 ms. Subjective measures of whether a response was delivered on time were done auditorily for all responses not delivered in overlap. Coders were asked to consider the rhythm of the conversation leading up to the response and to judge whether it sounded delayed or not.

Analytic Methods.

To examine the distribution of turn offsets by language we calculated mean, median, and mode information for each language (see Table S3) and plotted the distribution of turn offset times. To test whether the same factors account for faster and slower response times in different languages, 2 sample t tests were done across 8 categories of data reported in Figs. 36: answers vs. nonanswer responses, confirmations vs. disconfirmations, responses with visible components vs. vocal-only responses, and responses to questions with questioner gaze and responses to questions without gaze. Details of these t tests are in Tables S5 and S6. This information informed the design of multivariate analysis that was performed by using a multilevel mixed effects linear regression model in STATA. This takes into consideration that there is clustering in the data: responses to questions are clustered within 101 interactions. Interactions are clustered within 10 languages. This model tested for association and explanatory power at each of 3 levels.

Supplementary Material

Supporting Information

Acknowledgments.

We thank Michael Dunn, John Heritage, and Asifa Majid for comments on earlier drafts of this article. This work was carried out in the Multimodal Interaction Project within the Language and Cognition Group at Max Planck Institute for Psycholinguistics, funded by the Max Planck Society.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0903616106/DCSupplemental.

References

  • 1.Sacks H, Schegloff EA, Jefferson G. A simplest systematics for the organization of turn taking for conversation. Language. 1974;50:696–735. [Google Scholar]
  • 2.Stephens J, Beattie GW. On judging the ends of speaker turns in conversation. J Lang Soc Psychol. 1986;5:119–134. [Google Scholar]
  • 3.Clancy PM, Thompson SA, Suzuki R, Tao H. The conversational use of reactive tokens in English, Japanese, and Mandarin. J Pragmat. 1996;26:355–387. [Google Scholar]
  • 4.Kendon A. Some functions of gaze direction in social interaction. Acta Psychol. 1967;26:22–63. doi: 10.1016/0001-6918(67)90005-4. [DOI] [PubMed] [Google Scholar]
  • 5.Chapple ED. Quantitative analysis of the interaction of individuals. Proc Natl Acad Sci USA. 1939;25:58–67. doi: 10.1073/pnas.25.2.58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Duncan S, Jr, Fiske DW. Face-to-Face Interaction: Research, Methods, and Theory. New York: Wiley; 1977. [Google Scholar]
  • 7.Beattie GW, Cutler A, Pearson M. Why is Mrs. Thatcher interrupted so often? Nature. 1982;300:744–747. [Google Scholar]
  • 8.de Ruiter JP, Mitterer H, Enfield NJ. Projecting the end of a speaker's turn: A cognitive cornerstone of conversation. Language. 2006;82:515–535. [Google Scholar]
  • 9.Wilson M, Wilson TP. An oscillator model of the timing of turn-taking. Psychonom Bull Rev. 2005;12:957–968. doi: 10.3758/bf03206432. [DOI] [PubMed] [Google Scholar]
  • 10.Jefferson G. A case of precision timing in ordinary conversation: overlapped tag-positioned address terms in closing sequences. Semiotica. 1973;9:47–96. [Google Scholar]
  • 11.Lehtonen J, Sajavaara K. In: Perspectives on Silence. Tannen D, Saville-Troike M, editors. Norwood, NJ: Ablex; 1985. p. 198. [Google Scholar]
  • 12.Reisman K. In: Explorations in the Ethnography of Speaking. Bauman R, Sherzer J, editors. Cambridge, UK: Cambridge Univ Press; 1974. pp. 110–124. [Google Scholar]
  • 13.Tannen D. In: Perspectives on Silence. Tannen D, Saville-Troike M, editors. Norwood, NJ: Ablex; 1985. pp. 93–112. [Google Scholar]
  • 14.Agliati A, Vescovo A, Anolli L. In: The Hidden Structure of Interaction: From Neurons to Culture Patterns. Anolli L, Duncan S Jr, Magnusson MS, Riva G, editors. Amsterdam: IOS Press; 2005. pp. 223–235. [Google Scholar]
  • 15.Sugawara K. Afrcan Study Monographs. 1996;22(Suppl):145–164. [Google Scholar]
  • 16.Wieland M. In: Pragmatics and Language Learning 2. Bouton L, Kachru Y, editors. Urbana: University of Illinois Press; 1991. pp. 101–118. [Google Scholar]
  • 17.Gudykunst WB, Nishida T. Bridging Japanese/North American Differences. Thousand Oaks, CA: Sage Publications; 1994. [Google Scholar]
  • 18.Schegloff EA. In: Roots of Human Sociality: Culture, Cognition, and Interaction. Enfield NJ, Levinson SC, editors. Oxford: Berg; 2006. pp. 70–96. [Google Scholar]
  • 19.Schegloff EA. Sequencing in conversational openings. Am Anthropol. 1968;70:1075–1095. [Google Scholar]
  • 20.Gergely G, Nádasdy Z, Csibra G, Bíró S. Taking the intentional stance at 12 months of age. Cognition. 1995;56:165–193. doi: 10.1016/0010-0277(95)00661-h. [DOI] [PubMed] [Google Scholar]
  • 21.Sacks H. Sociological description. Berkeley J Sociol. 1963;8:1–16. [Google Scholar]
  • 22.Schegloff EA, Jefferson G, Sacks H. The preference for self-correction in the organization of repair in conversation. Language. 1977;53:361–382. [Google Scholar]
  • 23.Levinson SC. In: Roots of Human Sociality: Cognition, Culture, and Interaction. Enfield NJ, Levinson SC, editors. London: Berg; 2006. pp. 39–69. [Google Scholar]
  • 24.Murray L, Trevarthen C. The infant's role in mother–infant communication. J Child Lang. 1986;13:15–29. doi: 10.1017/s0305000900000271. [DOI] [PubMed] [Google Scholar]
  • 25.Meltzoff AN, Moore MK. Imitation of facial and manual gestures by human neonates. Science. 1977;198:75–78. doi: 10.1126/science.198.4312.75. [DOI] [PubMed] [Google Scholar]
  • 26.Striano T, Henning A, Stahl D. Sensitivity to interpersonal timing at 3 and 6 months of age. Interaction Studies. 2006;7:251–271. [Google Scholar]
  • 27.Sidnell J. Conversational turn-taking in a Caribbean English Creole. J Pragmatics. 2001;33:1263–1290. [Google Scholar]
  • 28.Tanaka H. Turn-Taking in Japanese Conversation: A Study in Grammar and Interaction. Amsterdam: John Benjamins; 1999. [Google Scholar]
  • 29.Levinson SC. Pragmatics. Cambridge, UK: Cambridge Univ Press; 1983. [Google Scholar]
  • 30.Clayman S. In: Advances in Group Processes: Group Cohesion, Trust and Solidarity. Lawler EJ, Thye SR, editors. Oxford: Elsevier; 2002. pp. 229–253. [Google Scholar]
  • 31.Stivers T, Robinson JD. A preference for progressivity in interaction. Lang Soc. 2006;35:367–392. [Google Scholar]
  • 32.Heritage J. Garfinkel and Ethnomethodology. Cambridge, UK: Polity Press; 1984. [Google Scholar]
  • 33.Pomerantz A. In: Structures of Social Action: Studies in Conversation Analysis. Atkinson JM, Heritage J, editors. Cambridge: Cambridge Univ Press; 1984. pp. 57–101. [Google Scholar]
  • 34.Duncan S., Jr . Nonverbal Communication. New York: Oxford Univ Press; 1974. pp. 298–311. [Google Scholar]
  • 35.Stivers T, Rossano F. Mobilizing response. Res Lang Social Interaction. 2009 in press. [Google Scholar]
  • 36.Rossano F, Brown P, Levinson SC. In: Conversation Analysis: Comparative Perspectives. Sidnell J, editor. Cambridge: Cambridge Univ Press; 2009. pp. 187–249. [Google Scholar]
  • 37.Greenberg S. Speaking in shorthand: A syllable-centric perspective for understanding pronunciation variation. Speech Commun. 1999;29:159–176. [Google Scholar]
  • 38.Clark HH. Semantics and Comprehension. The Hague: Mouton; 1976. [Google Scholar]
  • 39.Bavelas JB, Coates L, Johnson T. Listener responses as a collaborative process: The role of gaze. J Commun. 2002;52:566–580. [Google Scholar]
  • 40.Hall ET. The Silent Language. New York: Doubleday; 1959. [Google Scholar]
  • 41.Couper-Kuhlen E. English Speech Rhythm: Form and Function in Everyday Verbal Interaction. Amsterdam: John Benjamins; 1993. [Google Scholar]
  • 42.Moerman M. Talking Culture: Ethnography and Conversation Analysis. Philadelphia: Univ Pennsylvania Press; 1988. [Google Scholar]
  • 43.Evans N, Levinson SC. The myth of language universals: Language diversity and its importance for cognitive science. Behav Brain Sci. 2009 doi: 10.1017/S0140525X0999094X. in press. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES