Abstract
The motor theory of speech perception assumes that activation of the motor system is essential in the perception of speech. However, deficits in speech perception and comprehension do not arise from damage that is restricted to the motor cortex, few functional imaging studies reveal activity in motor cortex during speech perception, and the motor cortex is strongly activated by many different sound categories. Here, we evaluate alternative roles for the motor cortex in spoken communication and suggest a specific role in sensorimotor processing in conversation. We argue that motor-cortex activation it is essential in joint speech, particularly for the timing of turn-taking.
Introduction
Spoken communication relies on the listener rapidly decoding the signal that is produced by a speaker. The apparent ease of speech production and perception processes belies the complexity of the motor acts that are necessary to produce speech and of the resultant complex acoustic signal that the listener processes. Indeed, specific speech sounds (phones) can be hard to separate and identify from the speech signal. This is because individual speech sounds can be produced in a variety of ways. The way that sounds are produced varies with their position in a word: in British English, the phone /p/ in ‘port’ is quite unlike the /p/ sound in ‘sport’ – e.g. in the former it is aspirated (produced with a puff of air) and in the latter it is not. Speech sounds also vary according to the surrounding phonemes, so the /s/ at the start of ‘sue’ is acoustically different from the /s/ at the start of ‘see’, as the position of the lips anticipates the following vowel. In continuous speech, we can consider the sounds of speech to ‘run into each other’ and influence each other, similar to letters in cursive handwriting as compared to printed letters1.
Experimental evidence of the variability in speech sounds was very striking to researchers who first were able to investigate the structure of speech in spectrograms2 (see Box 1), and it was proposed that the listener tracks the articulatory gestures (that is, the movements of the larynx, jaw, soft palate, lips and tongue that are needed to produce the words that are being spoken) that the speaker aims to make when speaking, even if these gestures are not fully realized. One central feature of this ‘motor theory of speech perception’ is that speech is special in the sense that it is processed differently to other acoustic signals. Specifically, motor theorists state that speech is perceived as the gestures of the articulators, and that this process recruits the motor system3,4,5. The history of the motor theory is well described by Galantucci, Fowler and Turvey6. They note that it has proved difficult to show that the motor acts of speech and the resulting sounds can ever really dissociate, because speech sounds are intrinsically linked to the actions that produce them6. Nonetheless, the motor theory has remained an important approach to our understanding of speech processing.
Box 1. Structure in sound.
Every sound informs the listener about the actions and objects that made it (‘action-to-sound’). Striking a bell produces a characteristic sound with a sharp ‘attack’ time (which reflects the striking action) and the simultaneous onset of many harmonics (reflecting the metal with which the bell is made and the resonance characteristics of its shape). We are sensitive to both physical and dynamic information in sounds - for example, we can hear the acoustic differences between bouncing and breaking objects82.
The way a sound is made also affects how we interact with it (‘sound-to-action’): people move in time to musical sequences based on when they hear the ‘beat’ of the sounds rather than the sound’s physical onset83,84. In turn, these perceptual ‘beats’ depend on how the sound is made – the sharp attack of a drum leads to an earlier ‘beat’ than that of a bowed note on a violin. Thus different orchestral instruments are physically sounded at varying times in order to appear ‘in time’ with each other in an ensemble85.
In speech, there are different kinds of structural information, which are to some degree independent (see Box Figure below). The complex phonetic information is created by the filtering of laryngeal sounds by the positioning and movement of the articulators (shown in the spectrogram). Pitch variation (represented by the fundamental frequency f0) is used to express meaning and emphasis, and carries linguistic meaning in some languages. As in music, speech also has a rhythm, in which the perceptual ‘beats’ correspond to properties and timing of syllables81,86. Acoustically, as in music, these beats correspond broadly to aspects of the amplitude envelope, especially associated with vowel onsets81,86. Brain responses to speech could be differentially tracking any or all of these factors – e.g. the left STS preferentially processes the phonetic information in speech21, 26, 40, 41.
An alternative view states that speech perception involves acoustic processing of the signal7. According to this ‘acoustic’ perspective, speech is perceived by analysing its spectrotemporal properties. Constellations of acoustic cues such as voicing, spectral centre of gravity and amplitude are mapped onto perceptual categories such as phonemes, diphones, and syllables. No one acoustic feature is dominant in this process: even a simple phonetic contrast such as the difference between /aba/ and /apa/ is distinguished by at least sixteen different acoustic cues8. Speech perception is thus viewed as the recognition of complex acoustic patterns, which occurs entirely in the auditory system and does not involve the motor system. This would translate in anatomical terms into speech processing being dependent on auditory areas and auditory association areas in the temporal lobes rather than motor and premotor cortex9,10.
In this Perspective, we critically evaluate the role of motor cortex in speech processing. Two recent papers have specifically criticized the evidence for motor systems in human perception11 with particular reference to speech12. We not only assess evidence for and against the motor theory of speech perception, but also attempt to integrate other possible roles for motor processes in speech, and discuss the extent to which motor processes can be considered essential13 to speech processing.
Motor responses to speech
Evidence for motor cortex involvement in speech perception
The motor theory of speech perception has received a lot of support recently from studies of the motor system and its response to heard speech6. Here we define the motor system as primary motor (BA 4) and premotor cortex (BA 6 and BA 44), including supplementary motor area (SMA), while excluding Broca’s area in its widest sense (i.e. BA 45 as well as BA 44), because it is not consistently activated by speech production14). These regions form part of the ‘mirror’ or action execution system, that is, cortical fields that are activated both during the performance and perception of a motor act. Several studies have shown that hearing speech increases activation in motor and premotor cortex, both in terms of enhanced EMG measurements from the muscles around the mouth in TMS studies15, and in terms of peaks of activation in motor cortex in fMRI studies16, 17. Disruption of premotor-cortex activity with TMS has been shown to detrimentally affect the discrimination of stop consonants in syllables that were masked with white noise13. Motor responses to speech are specific to the ways in which speech sounds are produced: speech sounds that are made in different ways (e.g. /p/ vs /t/) differentially activate the motor areas that are associated with different articulators18. For example, the pattern of motor-cortex response to hearing a bilabial plosive /p/ is different from the motor-cortex response to hearing an alveolar plosive /t/9. Together, these studies support the hypothesis that the motor system has a role in speech perception.
Anatomical links between audition and motor cortex
In terms of neuroanatomy, there is evidence for a role of both acoustic and motoric processing of speech and sound9. In the primate temporal lobe there are at least two streams of processing from the primary auditory cortex: one stream runs down the temporal lobe and is sensitive to aspects of the acoustic information in sounds and speech (a ‘what’ stream), and another stream runs posterior from the primary auditory cortex and shows properties of sensorimotor interactions for speech and other sounds (see Figure 1)9,10 (a ‘how’ pathway). Although the two auditory streams interact functionally20, they also show different response profiles in functional imaging experiments. For example, the anterior stream is sensitive to the intelligibility of speech21, whereas the posterior route is sensitive to the perceived difficulty of production of non-native phonemes17 and is activated by silent articulation22. Speech has thus been argued to be processed both as a sound (‘what’) and as an action (‘how’)9. Evidence from non-human primate studies indicates that these anterior and posterior streams project to adjacent but non-overlapping regions in the frontal cortex, and that the posterior stream specifically projects to the premotor cortex23. The posterior pathway thus provides an anatomical pathway for the gestural processing of speech as predicted by the motor theory3 and for the motor sensitivity to speech sounds which has been reported in studies16,17.
Figure 1. The anatomy of sound perception.
This figure shows the posterior and anterior auditory streams. The main anatomical structures that are recruited for speech perception are shown in colour on a lateral view of the human brain. From primary auditory cortex (A1), the arrows depict the putative direction of an anterior and posterior stream of processing for speech and other sounds. The anterior stream decodes meaning in sounds (“what”) and encompasses auditory association cortex in the superior temporal lobe (green) and inferior frontal cortex (red), whereas the posterior pathway has been suggested to be engaged in sensorimotor integration and spatial processing (“how”, “where”), and includes parts of posterior superior temporal gyrus (green), inferior parietal cortex (blue), motor and sensory areas (orange and yellow, respectively), and inferior frontal gyrus (red)9, 23, 93.
There is evidence that the ’how’ pathway is less available for conscious processing than the ‘what’ pathway. For example, listeners can accurately tap along to sound sequences, and accurately change the timings of their taps to track small timing changes in the sequence (a sensorimotor task). However, in a purely perceptual listening task with the same stimuli (and no tapping), the same listeners need relatively larger timing changes in order to report hearing a difference24. Likewise, in speech production, speakers can correct their articulations to compensate for online acoustic distortions of the sounds they are making without being aware of these distortions occurring25. Speaking when one’s speech is distorted in real time strongly activates posterior auditory areas25, which implicates the ‘how’ pathway in both the detection and the compensation for distortion of speech output. In contrast, the perceptual identification and recognition of the sounds of speech primarily involves temporal lobe areas lateral and anterior to primary auditory cortex that are part of the anterior ‘what’ pathway21,26.
Dissociation of speech input and output systems in neuropsychology
The sensorimotor (‘how’) stream of auditory processing clearly provides an anatomical route for linking speech perception and speech production and for motor representations to play a role in the perception of speech. However, what remains unclear is what that role might be – are motor processes truly essential for speech perception13 or is their involvement less central to the reception of speech and more important in other linguistic and non-linguistic computations (e.g. sequence processing, semantic representations)? Certainly, there is considerable clinical evidence (see below) that impairments in speech comprehension can be dissociated from impairments in speech production. Assuming that speech comprehension is preceded by perception, this dissociation would argue against a central role of motor processes in speech perception. For example, neuropsychological evidence suggests that disorders of the motor control of speech output are associated with anterior brain lesions27 and do not compromise speech comprehension, which implies that the perceptual systems underlying comprehension are unimpaired in patients with such disorders. Thus, patients with anterior brain lesions, leading to expressive aphasia, perform normally in tests of auditory speech perception and comprehension, e.g. auditory single word—picture matching28,29. Conversely, patients with receptive aphasia following a stroke show deficits in speech perception and comprehension (e.g. poor scores on auditory single word picture-matching 29) whereas their speech production is fluent, if nonsensical30 (unlike that of patients with an expressive aphasia). Both receptive and expressive aphasic patients show abnormalities in the explicit perceptual categorization of speech sounds (e.g. labeling a sound as /da/ or /ga/), which would, for the expressive patients, suggest a role for motor control areas in speech perception. However, these deficits do not correlate with the patients’ speech comprehension measures in either patient group31, and it has been suggested that the categorization task is not measuring the speech perception processes that ordinarily contribute to speech understanding 10,11, 17. Thus, from the clinical literature it appears that motor processes that are involved in speech production do not directly contribute to speech perception and comprehension10,11.
Dissociation of speech input and output systems in development
There is also scant evidence that motor processes have a central role in the development of speech perception and comprehension. A lack of auditory stimulation during development has detrimental effects on speech production, but a lack of speech motor output does not necessarily affect speech perception and comprehension. Thus, children born with hearing loss can learn to speak as they have somatosensory feedback, but it is difficult (even with relatively moderate hearing loss) and can take longer than in hearing children32. By contrast, individuals who have grown up with severe dysarthria or anarthria, i.e. with very impaired or no speech production ability, can have intact speech comprehension33. These clinical findings suggest that speech comprehension and production dissociate in development. In normally developing infants, speech perception skills are in place well before speech production skills34. Vowel perception performance at 6 months predicts later vocabulary acquisition35, suggesting that the detailed perceptual processing of speech underlies early language development, before skilled speech production abilities are acquired. Studies have shown that although there are links between speech-production development and other manual motor skills36, variability in speech perception and comprehension (at 21 months) correlates with neither speech production measures nor measures of oro-facial or manual motor skills. Instead, speech perception skills vary with cognitive measures and socio-economic status37. If there are interactions between perceptual processes and motor representations in speech perception, they are not factors that drive variation in the development of spoken language.
Functional neuroimaging studies of speech perception
Speech perception was one of the earliest topics to be addressed with functional brain imaging, possibly owing to the successful delineation of cortical areas associated with speech in neuropsychological studies31. Early speech perception studies revealed extensive bilateral activation of the dorso-lateral temporal lobes, including primary auditory cortex, in response to hearing speech as compared to a silent baseline condition38. More recent studies have refined this activity down to regions in the left (and sometimes right) superior temporal sulcus (STS)21-22, 26, 39-43. It is notable that few of these studies of speech perception showed motor-cortex activity at a whole brain level of analysis, and the discussions correspondingly tend not to focus on motor theories of speech perception. As noted in a recent meta-analysis of the prelexical processing of speech44, speech perception is mainly associated with activation in dorso-lateral temporal lobe regions, with an emphasis on the STS.
However, the pattern of activation to heard speech can extend beyond the dorso-lateral temporal lobes, for example into anterior and ventral temporal lobe regions26 that are associated with semantic knowledge45. Moreover, the cortical activation in response to speech can vary with aspects of the task – for example, frontal activation extending into premotor cortex is seen if participants actively try to detect or classify speech sounds10 or if the speech is distorted in some way42,46. Some authors have expressly addressed the issue of how task structure affects the involvement of motor cortex in speech perception10, 47, and we will consider this issue in the discussion of candidate roles for motor systems in language processing below.
Taking into account the studies we have described so far, there are thus two, largely non-overlapping, sets of data about the neural basis of speech perception. One set reveal activity mainly in the dorso-lateral temporal lobes, and the other show an involvement of the motor cortex (along with dorso-lateral temporal lobe activation) in speech perception. Why is there such a difference in emphasis and interpretation? Apart from the use (or not) of the motor theory of speech as an explanatory framework, this difference could come down to the use of baseline conditions. Speech is a complex sound as well as a linguistic signal. Hence to isolate the neural response to the phonetic and linguistic information, functional imaging studies since the early 1990s have tended to use acoustic baselines which possess similar spectrotemporal properties to speech, including spectrally rotated speech21,26,40,41,43, ‘musical rain’42, reversed speech48, and signal correlated noise39,46. In studies that report motor activation in response to speech at a whole brain level of analysis, however, the peak activations do not always correspond to the outcome of direct comparison with a complex acoustic baseline. This point is illustrated in a summary plot of motor responses to a range of sound categories (including speech, degraded speech, emotional vocalizations, tool sounds and music) (Figure 2). TMS studies of speech perception15, 18 have also failed to provide convincing evidence of motor activation to speech sounds being necessarily different than those to the baseline sounds12,. For example, one study15 revealed no significant difference between the motor activation due to speech and that due to environmental, non-verbal sounds, such as car engines and breaking glass.
Figure 2. Responses to sound in motor cortex.
A comparison of peak activations in the motor cortex obtained from fourteen functional imaging studies of auditory perception. The peaks were selected based on these criteria: auditory presentation of stimuli, the absence of a motor task (i.e. button press or covert rehearsal) on contrast relevant trials, and a whole-brain level analysis. Peak activations are shown, in MNI space, for responses to speech (red 16, 17, 48, 94, 95), degraded speech (orange 42, 46, 96), human emotional vocalizations (mid blue 80), human song (light blue 97), animal vocalisations (light green 98), tool sounds (dark green 99), action sounds (i.e. derived from humans, such as kissing or ripping paper – yellow 89), and music (pink100). Peaks in the supplementary motor area (SMA) are plotted on a medial view. Diamond symbols indicate the activation peaks that were obtained from direct contrasts of the sound category of interest with a suitably complex auditory baseline. Circles indicate no direct comparison with an acoustic baseline.
It is possible that other fMRI techniques such as multivariate pattern analysis and functional connectivity may reveal different patterns of activity in brain regions for speech and other sounds44. However, to date, the evidence from functional imaging strongly implies that the motor-cortex activity associated with speech perception, while undoubtedly present, is not driven by the phonetic or linguistic content of the speech. This activity could instead be linked to aspects of auditory events that are common to a wide variety of sounds, including speech (Box 1, Figure 2). This does not weaken the findings of motor-cortex activation in response to speech or query their support of the motor theory of speech perception, but it does call into question what the motor-cortex responses to speech actually mean.
Candidate functions of motor responses
Linguistic functions
There are variety of linguistic functions which may be served by motor cortex. Motor activity could reflect a role for the motor cortex (although not necessarily the same subregions) in one, some, or all of the following. First, the primary and pre-motor cortex could be important in phonemic processing, as per the motor theory of speech perception6,16,17. However, as discussed in the previous section, the variability of motor-and premotor cortex responses to speech sounds, in contrast to the specificity of responses to speech (relative to acoustic baselines) in the temporal lobes, suggests that the motor-cortex response to speech does not reflect a primary, perceptual route to the comprehension of speech (see also 12). This is supported by the difference in lesions that cause speech perception and production deficits, respectively, in clinical groups, and the different profiles of speech perception and production in development.
Second, the motor cortex could have a role in syntactic processing: it has been suggested that syntax is represented and processed in systems that regulate motor output including speech production49. In this account, syntax (as a rule-based system) is considered to have evolved from brain mechanisms for processing intention-based action sequences50. Thus syntax could thus be mediated by motor representations of actions in the perception of spoken language49. Consistent with this possibility, several authors have linked activity in ventral premotor areas with aspects of local structure computation in syntactic processing51,52. Activations in these more posterior premotor areas, which lie behind the classic Broca’s area (BA 44/45), have been proposed to interact with higher-order syntactic processing in Broca’s area. Notably, this processing may not be restricted to speech, as activation of the premotor cortex has also been reported in response to aspects of non-speech sequences53.
Third, several authors have suggested a role for motor-cortex representations of action in semantic knowledge54. There is ample evidence that aspects of semantic knowledge involve interactions with the motor system – both behaviourally and in terms of neuroanatomy. Behaviourally, motor tasks influence semantic processing, and semantic processing influences motor task performance. For example, when participants are asked to pick up tools while performing a spatial task, they tend to pick the objects up by their handles, which they are less likely to do if they are simultaneously performing a distracting semantic task55. This was interpreted as showing that the functional grasping of objects requires input from the semantic system. This interaction is consistent with aspects of embodied semantic representations – i.e. semantic representations that are not completely abstract in form54. These semantic-action links extend into cortical activity profiles: studies on the neural representation of action words have shown that such words somatotopically activate the motor cortex, so that hearing words like ‘kick’ activate the most dorsal part of the motor strip (controlling the leg), whereas words like ‘pick’ activate an area ventral to this (controlling the arm/hand), and words like ‘lick’ activate the most ventral part of the motor cortex (controlling the articulators)56. These activations are fast and automatic56. We argue that these effects are probably not restricted to the motor cortex: for example, across both visual and auditory presentations, the perception of words that are highly imagable (such as ‘glove’) activates the visual association cortex in a way that low imagable words (such as ‘love’) do not57,58. These findings are elegant demonstrations of the recruitment of modality-specific cortex in the representation of semantic knowledge, and show that semantic information is embodied in the physical aspects of the information.
Task-related functions of the motor cortex
The three alternatives above reflect automatic processing of the auditory speech signal. However, it is also possible that the motor cortex is specifically involved in speech processing when the particular task or listening context demands it – i.e. that motor knowledge is used to support other speech-processing systems47. Thus, stronger responses are seen in both the left and right premotor cortex when participants listen to degraded speech than to clear speech46 or to non-native speech sounds17. Likewise, responses to seeing someone speak are enhanced in the motor cortex when the visual signal is degraded but not when non-speech signals (such as sticking the tongue out) are degraded59. There is also evidence that there are greater motor responses to rare words, than to frequently occurring words60, which is consistent with a role for the motor system in supporting semantic processing. This ‘lexical frequency effect’ occurs later than phonological influences on motor cortex activation, suggesting that the motor system interacts with lexical systems in the temporal lobes to assist comprehension when we encounter an unfamiliar word60. These task effects might explain why motor responses are inconsistently seen in functional-imaging studies– there might be motor processes that are associated with perceptual difficulty or specific things that people are asked to do when they hear the speech, rather than with the basic aspects of speech perception and comprehension10,47.
Sound-to-action functions of motor cortex
A third possibility is that the motor cortex processes the actions that are reflected by sounds – for example, the difference between a plucked and a bowed note on a stringed instrument [Box 1] and which in turn are conveyed in relatively simple properties of the sounds (e.g. the onsets). In this account, motor-cortex responses to sound might depend on very basic ways in which a sound is produced by an action (Box 1), and hence motor cortical responses are seen to a variety of sound categories (figure 2, Box 2). Importantly, these responses need not require the identification of that sound and could be mediated entirely through the posterior ‘how’ pathway. For example, tracking the rhythm of a repeating sound would involve the ‘how’ pathway rather than distinguishing through the ‘what’ pathway whether that sound is a drum or a footstep (Box 1). Below, we suggest that these sensation-action links have evolved into something that is important in complex motor coordination, and that this co-ordination may have a particular significance for human language use.
Box 2. Sound and speech, action and emotion.
Sounds activate motor cortex in primates87, which is evidence for an auditory mirror system, comprising neurons active in both production and perception of actions (though see11). FMRI studies in humans have shown that noises that are made with the hand and with the mouth activate premotor cortex somatotopically, with more dorsal responses to ‘hand’ sounds and more ventral responses to ‘mouth’ sounds88. Thus, the neural response to sound seems to reflect the recruitment of motor processes that are physically linked to the effectors or actions producing the sound. Furthermore, the processing of sound in the motor cortex is plastic and can change with training. Non-musicians who are trained to play a particular piece of music showed activation in the premotor cortex and action-observation regions in the parietal lobe when they passively listened to the piece, compared to a baseline condition in which the same notes are played in a different order89.
Studies have found that non-verbal expressions of emotion, e.g. laughs and screams, also activate the primary motor and premotor cortex80,90. This effect is modulated by properties of valence and arousal80, with more-positive sounds recruiting motor and premotor cortex more strongly than more-negative expressions. These findings have been linked to the behavioural contagion of positive emotions like laughter91 – when we hear or see someone else smiling or laughing, it can prove difficult to resist laughing along. This motor response to laughter (and also to cheering) is consistent with findings showing that generally, when we like other people, we tend to mirror their posture and even use the same words as them65. This effect can be manipulated by motor behaviour itself: requiring people to perform synchronous, co-ordinated acts together leads later to more co-operative behaviour between them than if they are asked to perform non-co-ordinated joint actions92.
Motor cortex involvement in joint speech
We argue that motor-cortex responses are important in speech processing because they underlie sensorimotor processes that are essential in conversational speech. Speech perception and production are usually studied separately, using simple words or sentences, yet the primary use of spoken language is conversation, which has been described as a true linguistic universal61. Where people have language, they use it to talk to each other in conversations – arguably, we have evolved to speak in dialogue rather than monologue. We learn our first language first and foremost as a joint behaviour, and the turn-taking aspects of conversation are learnt before infants have many words with which to talk62. Thus, we argue that spoken language is relevant to the motor system because it is a behaviour with which we can expertly and smoothly co-ordinate our own actions63.
Convergence in conversation
There is ample evidence that we do indeed co-ordinate various aspects of our speech with that of fellow speakers. Classic studies of dialogue have shown the phenomenon of interactional synchrony64, where talkers unintentionally co-ordinate their actions and postures. Such mirroring of each others’ behaviour allows us to express our affiliation with those with whom we speak65 and also helps the process of conversation itself: in conversation we align our conceptual and syntactic structures to those of our co-talkers66. This linguistic alignment has been argued to be central to making conversation possible67. Conversational utterances are often elliptical and incomplete – e.g. people rarely speak in proper sentences that could be understood outwith the context of the conversation. We therefore continually need to adjust what we say to accommodate the specific knowledge of the other talkers67. This convergence can be extended to motor aspects of speech control in conversation: when we hold a conversation, we co-ordinate our breathing68 and our pronunciations begin to converge with those of our co-talkers69. The dynamic properties of the control of conversation have been linked to a perception-action pathway in the brain66,67, which maps onto the ‘how’ pathway, linking the human auditory cortex and the motor system (Figure 1)9. From this perspective, the tracking of specific articulations, which has been demonstrated in motor cortex19, could be facilitating the convergence on pronunciations of the words, rather than performing phonetic analysis.
Coordinating conversation – smooth turn-taking
Conversation is found in all human cultures, furthermore, the tight temporal co-ordination of the act of conversation is also universal. Conversation is characterized by several properties70 (Box 3), and of these the most striking is that turn-taking in conversation typically occurs without pauses or overlapping speech. As has been pointed out previously63, conversation represents a “considerable achievement. Conversations unfold in real time, and yet parties to the conversation synchronize their turns – usually highly coherent and consistent with respect to the topic – in a matter of milliseconds”63 (p78). A recent study showed that 45% of turn transitions in over 1500 examples from a corpus of phone calls occurred within a window of −250 to +250 milliseconds, and 85 % fell within a window of −750 to +750 milliseconds71. This represents an astonishing level of co-ordination, especially when one considers that this can take place without any visual input (e.g. on the phone) and between complete strangers72, without any detriment to the timing of turn-taking. Smooth and well-coordinated turn-taking is a hallmark of successful conversations, in all cultures and across modalities (e.g. in signing)73. Disrupting the timing of turn-taking seriously disrupts the fluency and ease of an interaction, as anyone who has tried to have a telephone conversation via a bad satellite link with lots of echoes can attest74. Along with Iacoboni75, we argue that the tight temporal co-ordination of turn-taking relies heavily on the motor system. We propose that while the temporal lobe and associated regions that are important in acoustic and linguistic representation and processing, track the meaning of what is being said, the motor system concurrently tracks the speech rate and rhythm of the current talker so that picking up the next turn will be a seamless process (Figure 3). In this approach, the motor system is not only crucial to organizing the act of speaking, it is also essential in facilitating the conversational dance of turn-taking, as well as the coordination of the other factors, such as interactional synchrony and convergence, which make conversation possible66,67. The motor system may also facilitate an alignment of interactions even when the meaning and content of what is said is highly constrained - for example, complete strangers are able to read a novel piece of text in synchrony with one another, apparently by coordinating the rhythm and melody of their speech76 (Box 1). Here, the acoustic properties of speech facilitate the co-ordination of actions, enabling the accurate entrainment of speech timing by the two talkers77.
Box 3. Principles of conversation70.
Speaker change occurs
One person speaks at a time
Simultaneous speech is common but brief
Transitions with no gap and no overlap are common
Turn order if not fixed, nor is turn size, duration of a turn or content
Number of talkers can vary
Talk can be continuous or discontinuous
Figure 3. Candidate roles for auditory streams of processing during conversation.
The arrows originating from primary auditory cortex illustrate a functional division in how sounds are processed (following from the streams of processing shown in Figure 1). Meaning of sound, especially linguistic meaning in speech, is decoded primarily in the superior temporal lobe and anterior to primary auditory cortex [A1] (light blue arrow). We suggest that aspects of coordinating conversation, in particular turn-taking, are mediated by the posterior ‘how’ pathway (dark blue arrow). This perception-action pathway subserves monitoring of the speech signal for rhythm and rate, which enables the listener to anticipate the end of a speaker’s turn, and thereby facilitates smooth turn-taking.
A specific model of turn-taking has used the concept of entrained oscillations as a way of accounting for the smooth patterns of turn-taking in conversation73. In this model, the listener entrains to the speech rate of another talker at the level of the syllable, and uses this entrainment to accurately time speech output as that talker comes to the end of a turn. We suggest that the computation of these rate and rhythm factors depends primarily on the motor system; this is consistent with interpretations of motor activation in perception tasks as being essential in anticipatory responses78.
Motor activation and cohesive behaviour – some predictions
There is little evidence in the literature that directly bears on our hypothesis. Going back to the clinical findings, it is clear that holding a conversation can be a real problem in aphasic syndromes, and can lead to considerable social isolation. While patients with expressive aphasia are good at conveying their needs, their conversational skills are impaired. In contrast, there is some evidence that in patients with receptive aphasia many of the basic rules of conversation are intact79. However, these are highly indirect sources of evidence, and our hypothesis needs to be directly tested. According to our hypothesis, greater motor activation would occur in response to heard speech when the participant believes that they are in a conversation than when they are hearing a recorded monologue – i.e. there would be greater motor-system involvement when the subject believes that someone is speaking to them, and that they will at some point reply70.
There are currently no neuroscientific studies on motor activations during conversation. Techniques with good temporal resolution, such as TMS, EEG or MEG, would be well placed to capture the dynamic aspects of conversational turn-taking. Studies using fMRI would also be informative once the procedural challenges of scanning someone with fMRI in a realistic conversation had been addressed.
We would also like to suggest that roles for the motor system in the orchestration of conversation are almost certainly not limited to speech. In fMRI studies, peaks of activation occur in response to heard speech relative to silence16, 17, but strong motor responses also occur to non-verbal (though vocal) expressions of emotion (relative to an acoustic baseline), especially expressions that typically occur in social contexts, such as laughter80 [Figure 2, Box 2]. Indeed, this link between the sensorimotor processing of action information in sound and speech suggests that such tightly co-ordinated motor acts would not be limited to vocal acts, and we suggest that this role could also be extended to synchronous activity and timing in playing music and song in polyphonic settings. The acoustic factors that have been proposed to underlie synchrony and rhythm (e.g. the way the sound starts) have been argued to be similar in speech and music81 [Box 1].
Conclusions
Acoustic speech signals can activate motor-cortex areas. In contrast to the predictions of the motor theory of speech perception, this does not seem to reflect phonetic processing of the speech signal. We argue here that the motor-cortex is not essential for perceiving spoken language; if it was, it should be more common to see motor-cortex activation to speech in fMRI studies, easier to distinguish unambiguous speech perception problems following anterior brain lesions, and there should be a clearer relationship between production and perception in development. Instead the motor activation can be linked to speech in several ways: as an embodiment of aspects of semantic information, as important in processing both linguistic and non-linguistic syntactic information, as a meta-linguistic process that can be invoked depending on task difficulty or specific task requirements, and as a response to the motor information that is expressed in speech and other sounds. In addition, we argue that a basic role for motor responses to heard speech is a glimpse of a much more important role for motor systems in joint speech, or conversation, where the motor cortex is essential for phenomena such as convergence66-69 interactional synchrony64,65, and ensuring smooth turn transition63,73,75. We argue that speech is processed and represented differently in the temporal lobes and motor cortex, and that the motor system is involved in a (non-speech specific) action pathway. According to our hypothesis this sound-to-action pathway is highly refined in human language to allow us to talk smoothly in turn with one another, even if we are strangers, and cannot see one another. Thus, we propose that there is a central role for motor representations and processes in conversation66,67, an essential aspect of human language in its most basic use.
Figure 4.
Acknowledgements
SKS, CM and FE are funded by Wellcome Trust Grant WT074414MA. We would like to thank Keith Kluender, Holger Mitterer, Lynne Bernstein, Tom Manly and Matt Davis for very helpful discussions on many of these issues.
Glossary terms
- Speech perception
refers here to the prelexical perceptual processing of the speech signal.
- Speech comprehension
refers here to post-perceptual, lexical, semantic and linguistic processing of speech - while speech comprehension does require good speech perception, comprehension can also be enhanced by higher order syntactic and semantic features (e.g. sentence predictability).
- Embodied
refers to here to theories of semantic representations which link the more abstract elements of the representations to more concrete elements of their material properties – for example, part of the meaning of ‘a football’ is represented by how one might kick it.
- Convergence
refers here to the way that different aspects of joint speech (both motoric and linguistic) become united, or co-ordinated, between speakers.
- Phonemes
(such as /p/ or /t/) are considered to be elemental sounds of speech, and can be used in the explicit transcription and classification the sounds of a language.
- Phonetic
pertaining to speech sounds (phones)
- Articulators
the anatomical structures used to make the sounds of speech, including the tongue, lips, jaw and soft palate.
- Voicing
the sound made by vibrations of the vocal folds; for example, the sound at the start of ‘zoo’ is voiced, whereas that at the start of ‘sue’ is unvoiced.
- Spectral centre of gravity
the average value of the spectral components of a sound, which captures how the sound is weighted across low to high frequencies – for example ‘s’ has a higher spectral centre of gravity than ‘sh’.
- Phone
a single speech sound (which is always a variant of a phoneme), for example, the aspirated /p/ at the start of ‘port’ is a different phone from the /p/ of ‘sport’, but these are both examples (allophones) of the phoneme /p/.
- Diphones
a cluster of two phones which can be legally combined in a language (e.g. /sk/ is legal at the start of a syllable in English, but /ks/ is not); diphones thus contain transitional information between the two phones, and are more information rich than single phones.
- Syllable
like diphones, syllables typically contain information about the organization of speech at a level higher than the phoneme - a single syllable word, like ‘start’, can be broken down into an onset and a rhyme, (e.g. st-art), and may consist of only the rhyme (e.g. ‘art’): the rhyme may be further broken down into a nucleus and coda (e.g. ar-t).
- Expressive aphasia
a speech production deficit in which people have reduced fluency, and grammatical errors, and problems in articulating speech accurately.
- Receptive aphasia
a speech perception and comprehension deficit in which the patient has great difficulty in following what is being said to them: speech production is unimpaired in terms of fluency though its content can be meaningless, and many patients are unaware that they have a problem.
- Subtraction designs
the subtraction of the activation in one condition from that of another in a functional imaging study, where the aim is to isolate neural responses that are specific to the experimental condition.
- Prelexical processing
in this context, the neural processing of speech sounds prior to the representation of word identity and meaning.
- Linguistics
in this context, the phonemic, semantic or syntactic processing of the heard speech, which is distinct from the processing of the basic acoustic properties of speech (e.g. loudness).
- Phonemic
the representation and processing of phonemes.
- Semantic
the meaning of things, in this case words and language
- Syntax
the rules that determine the correct arrangement and inflection of words in spoken or written language.
- Local structure computation
the sequential analysis of heard speech (the sandwich was eaten), as opposed to higher-order, hierarchical computations across longer time scales (the sandwiches were eaten by the kids at the party).
Biographies
Short Biographies
Sophie K Scott
Sophie Scott is the group leader of the Speech Communication at the Institute of Cognitive Neuroscience at UCL. She was awarded a PhD at UCL in the acoustic basis of rhythm in speech in 1994, and spent several years as a post-doctoral researcher at the MRC CBU in Cambridge. She currently holds a Wellcome Trust Senior Fellowship, and has been funded by the Wellcome Trust since 2001. Her research uses functional imaging to investigate the cortical basis of human speech perception and production, and she applies models from primate auditory processing to the neural basis of human perception.
Carolyn McGettigan
Carolyn McGettigan is a Research Associate in the Speech Communication Group at the Institute of Cognitive Neuroscience, University College London, where she completed a Ph.D. in Human Communication. She is a former member of the Centre for Speech, Language and the Brain at the University of Cambridge. Her work combines behavioural and neuroimaging techniques to investigate speech input and output processes in the human brain. She is particularly interested in the role of rhythm in speech comprehension, and in the cognitive factors underlying individual differences in speech processing.
Frank Eisner
Frank Eisner’s work investigates the functional and cortical organization of speech perception, with a special interest in learning and plasticity within this system. After having studied Psychology & Communication in Cardiff, he obtained his PhD from Radboud University Nijmegen. He spent several years as a post-doctoral fellow at University College London and is now a researcher at the Max Planck Institute for Psycholinguistics in the Netherlands.
References
- 1.Kluender KR, Alexander JM. Perception of speech sounds. In: Dallos, Ortel, editors. The Senses, a comprehensive reference. 3 (Audition) Academic Press; San Diego: 2008. pp. 829–860. [Google Scholar]
- 2.Liberman AM, Delattre P, Cooper FS. The role of selected stimulus-variables in the perception of the unvoiced stop consonants. Am. J. Psych. 1952;65(4):497–516. [PubMed] [Google Scholar]
- 3.Liberman AM, Cooper FS, Shankweiler DP, Studdert-Kennedy M. Perception of the speech code. Psych. Rev. 1967;74(6):431–461. doi: 10.1037/h0020279. [DOI] [PubMed] [Google Scholar]
- 4.Liberman AM, Mattingly IG. The motor theory of speech perception revised. Cognition. 1985;21(1):1–36. doi: 10.1016/0010-0277(85)90021-6. [DOI] [PubMed] [Google Scholar]
- 5.Fowler CA. An event approach to the study of speech-perception from a direct realist perspective. J. Phonetics. 1986;14(1):3–28. [Google Scholar]
- 6.Galantucci B, Fowler CA, Turvey MT. The motor theory of speech perception reviewed. Psychon. Bull. Rev. 2005;13(3):361–77. doi: 10.3758/bf03193857. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Diehl RL, Kluender KR. On the objects of speech perception. Ecological Psychology. 1989;1(2):121–144. [Google Scholar]
- 8.Lisker L. Rapid vs rabid: a catalogue of acoustical features that may cue the distinction. Haskins Laboratories Status Report on Speech Research. 1978;SR-54:127–132. [Google Scholar]
- 9.Scott SK, Johnsrude IS. The neuroanatomical and functional organization of speech perception. Trends Neurosci. 2003;26(2):100–107. doi: 10.1016/S0166-2236(02)00037-1. [DOI] [PubMed] [Google Scholar]
- 10.Hickok G, Poeppel D. The cortical organization of speech processing. Nat. Rev. Neurosci. 2007;8(5):393–402. doi: 10.1038/nrn2113. [DOI] [PubMed] [Google Scholar]
- 11.Hickok G. Eight Problems for the Mirror Neuron Theory of Action Understanding in Monkeys and Humans. Journal of Cognitive Neuroscience. 2009;21:1229–1243. doi: 10.1162/jocn.2009.21189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lotto AJ, Hickok G, Holt LL. Reflections on Mirror Neurons and Speech Perception. Trends in Cognitive Sciences. 2009;13:110–114. doi: 10.1016/j.tics.2008.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Meister IG, Wilson SM, Deblieck C, Wu AD, Iacoboni M. The essential role of premotor cortex in speech perception. Curr Biol. 2007;17(19):1692–1696. doi: 10.1016/j.cub.2007.08.064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wise RJS, Greene J, Büchel C, Scott SK. Brain systems for word perception and articulation. Lancet. 1999;353(9158):1057–1061. doi: 10.1016/s0140-6736(98)07491-1. [DOI] [PubMed] [Google Scholar]
- 15.Watkins KE, Strafella AP, Paus T. Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia. 2003;41(8):989–994. doi: 10.1016/s0028-3932(02)00316-0. [DOI] [PubMed] [Google Scholar]
- 16.Wilson SM, Saygin AP, Sereno MI, Iacoboni M. Listening to speech activates motor areas involved in speech production. Nat. Neurosci. 2004;7(7):701–2. doi: 10.1038/nn1263. [DOI] [PubMed] [Google Scholar]
- 17.Wilson SM, Iacoboni M. Neural responses to non-native phonemes varying in producibility: evidence for the sensorimotor nature of speech perception. NeuroImage. 2006;33(1):316–325. doi: 10.1016/j.neuroimage.2006.05.032. [DOI] [PubMed] [Google Scholar]
- 18.Fadiga L, Craighero L, Buccino G, Rizzolatti G. Speech listening specifically modulates the excitability of tongue muscles: a TMS study. Eur. J. Neurosci. 2002;15(2):399–402. doi: 10.1046/j.0953-816x.2001.01874.x. [DOI] [PubMed] [Google Scholar]
- 19.Pulvermüller F, et al. Motor cortex maps articulatory features of speech sounds. Proc. Natl. Acad. Sci. U.S.A. 2006;103(20):7865–7870. doi: 10.1073/pnas.0509989103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tardif E, Spierer L, Clarke S, Murray MM. Interactions between auditory ‘what’ and ‘where’ pathways revealed by enhanced near-threshold discrimination of frequency and position. Neuropsychologia. 2008;46(4):958–66. doi: 10.1016/j.neuropsychologia.2007.11.016. [DOI] [PubMed] [Google Scholar]
- 21.Scott SK, Blank CC, Rosen S, Wise RJS. Identification of a pathway for intelligible speech in the left temporal lobe. Brain. 2000;123:2400–2406. doi: 10.1093/brain/123.12.2400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wise RJS, Scott SK, Blank SC, Mummery CJ, Warburton E. Identifying separate neural sub-systems within ‘Wernicke’s area’. Brain. 2001;124:83–95. doi: 10.1093/brain/124.1.83. [DOI] [PubMed] [Google Scholar]
- 23.Romanski LM, et al. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nat. Neurosci. 1999;2(12):1131–1136. doi: 10.1038/16056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Repp BH. Phase correction, phase resetting, and phase shifts after subliminal timing perturbations in sensorimotor synchronization. J. Exp. Psychol. Hum. Percept. Perform. 2001;27:600–21. [PubMed] [Google Scholar]
- 25.Tourville JA, Reilly KJ, Guenther FH. Neural mechanisms underlying auditory feedback control of speech. NeuroImage. 2008;39:1429–1443. doi: 10.1016/j.neuroimage.2007.09.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Scott SK, Rosen S, Lang H, Wise RJS. Neural correlates of intelligibility in speech investigated with noise-vocoded speech - A positron emission tomography study. J. Acoust. Soc. Am. 2006;120(2):1075–1083. doi: 10.1121/1.2216725. [DOI] [PubMed] [Google Scholar]
- 27.Mohr JP, et al. Broca aphasia – pathologic and clinical. Neurology. 1978;28(4):311–324. doi: 10.1212/wnl.28.4.311. [DOI] [PubMed] [Google Scholar]
- 28.Blank SC, Bird H, Turkheimer F, Wise RJ. Speech production after stroke: the role of the right pars opercularis. Ann Neurol. 2003;54(3):310–20. doi: 10.1002/ana.10656. [DOI] [PubMed] [Google Scholar]
- 29.Crinion JT, et al. Listening to narrative speech after aphasic stroke: The role of the left anterior temporal lobe. Cerebral Cortex. 2006;16(8):1116–1125. doi: 10.1093/cercor/bhj053. [DOI] [PubMed] [Google Scholar]
- 30.Bogen JE, Bogen GM. Wernicke’s region – where is it? Ann. NY Acad. Sci. 1976;280:834–843. doi: 10.1111/j.1749-6632.1976.tb25546.x. [DOI] [PubMed] [Google Scholar]
- 31.Basso A, Casati G, Vignolo LA. Phonemic identification defect in aphasia. Cortex. 1977;13(1):85–95. doi: 10.1016/s0010-9452(77)80057-9. [DOI] [PubMed] [Google Scholar]
- 32.Mogford K. Oral language acquisition in the prelinguistically deaf. In: Bishop DVM, Mogford K, editors. Language Development in Exceptional Circumstances. Churchill Livingstone; NY: 1988. pp. 110–131. [Google Scholar]
- 33.Bishop DVM. Language development in children with abnormal structure or function of the speech apparatus. In: Bishop DVM, Mogford K, editors. Language Development in Exceptional Circumstances. Churchill Livingstone; NY: 1988. pp. 220–238. [Google Scholar]
- 34.Werker JF, Yeung HH. Infant speech perception bootstraps word learning. Trends in Cognitive Sciences. 2005;9(11):519–27. doi: 10.1016/j.tics.2005.09.003. [DOI] [PubMed] [Google Scholar]
- 35.Tsao F-M, Liu HM, Kuhl PK. Speech perception in infancy predicts language development in the second year of life: a longitudinal study. Child Dev. 2004;75:1067–1084. doi: 10.1111/j.1467-8624.2004.00726.x. [DOI] [PubMed] [Google Scholar]
- 36.Bates E, Dick F. Language, gesture, and the developing brain. Dev Psychobiol. 2002;40(3):293–310. doi: 10.1002/dev.10034. [DOI] [PubMed] [Google Scholar]
- 37.Alcock KJ, Krawczyk K. Individual differences in language development: Relationship with motor skill at 21 months. Dev Sci. 2010;13:677–691. doi: 10.1111/j.1467-7687.2009.00924.x. [DOI] [PubMed] [Google Scholar]
- 38.Wise R, et al. Distribution of cortical neural networks involved in word comprehension and word retrieval. Brain. 1991;114:1803–1817. doi: 10.1093/brain/114.4.1803. [DOI] [PubMed] [Google Scholar]
- 39.Mummery CJ, Ashburner J, Scott SK, Wise RJS. Functional neuroimaging of speech perception in six normal and two aphasic subjects. J. Acoust. Soc. Am. 1999;106(1):449–457. doi: 10.1121/1.427068. [DOI] [PubMed] [Google Scholar]
- 40.Narain C, et al. Defining a left-lateralized response specific to intelligible speech using fMRI. Cerebral Cortex. 2003;13(12):1362–1368. doi: 10.1093/cercor/bhg083. [DOI] [PubMed] [Google Scholar]
- 41.Liebenthal E, Binder JR, Spitzer SM, Possing ET, Medler DA. Neural substrates of phonemic perception. Cerebral Cortex. 2005;15(10):1621–1631. doi: 10.1093/cercor/bhi040. [DOI] [PubMed] [Google Scholar]
- 42.Uppenkamp S, Johnsrude IS, Marslen-Wilson W, Patterson RD. Locating the initial stages of speech-sound processing in human temporal cortex. NeuroImage. 2006;31(3):1284–1296. doi: 10.1016/j.neuroimage.2006.01.004. [DOI] [PubMed] [Google Scholar]
- 43.Obleser J, Scott SK, Eulitz C. Now you hear it, now you don’t: Transient traces of consonants and their nonspeech analogues in the human brain. Cerebral Cortex. 2006;16(8):1069–1076. doi: 10.1093/cercor/bhj047. [DOI] [PubMed] [Google Scholar]
- 44.Obleser J, Eisner F. Pre-lexical abstraction of speech in the auditory cortex. Trends Cogn. Sci. 2009:14–19. doi: 10.1016/j.tics.2008.09.005. [DOI] [PubMed] [Google Scholar]
- 45.Patterson K, Nestor PJ, Rogers TT. Where do you know what you know? The representation of semantic knowledge in the human brain. Nat Rev Neurosci. 2007;8(12):976–87. doi: 10.1038/nrn2277. [DOI] [PubMed] [Google Scholar]
- 46.Davis MH, Johnsrude IS. Hierarchical processing in spoken language comprehension. J. Neurosci. 2003;23(8):3423–3431. doi: 10.1523/JNEUROSCI.23-08-03423.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Davis MH, Johnsrude IS, Hervais-Adelman AG, Rogers JC. Motor regions contribute to speech perception: awareness, adaptation and categorization. J. Acoust. Soc. Am. 2008;123:3580. [Google Scholar]
- 48.Jardri R, et al. Self awareness and speech processing: An fMRI study. NeuroImage. 2007;35(4):1645–1653. doi: 10.1016/j.neuroimage.2007.02.002. [DOI] [PubMed] [Google Scholar]
- 49.Fogassi L, Ferrari PF. Mirror neurons and the evolution of embodied language. Current Directions in Psychological Science. 2007;16(3):136–141. [Google Scholar]
- 50.Greenfield PM. Language, tools and brain: The ontology and phylogeny of hierarchically organized sequential behaviour. Behavioural and Brain Sciences. 1991;14:531–595. [Google Scholar]
- 51.Friederici AD. Broca’s area and the ventral premotor cortex in language: Functional differentiation and specificity. Cortex. 2006;42(4):472–475. doi: 10.1016/s0010-9452(08)70380-0. [DOI] [PubMed] [Google Scholar]
- 52.Fiebach CJ, Schubotz RI. Dynamic anticipatory processing of hierarchical sequential events: A common role for Broca’s area and ventral premotor cortex across domains? Cortex. 2006;42(4):499–502. doi: 10.1016/s0010-9452(08)70386-1. [DOI] [PubMed] [Google Scholar]
- 53.Schubotz RI, von Cramon DY. Functional-anatomical concepts of human premotor cortex: evidence from fMRI and PET studies. NeuroImage. 2003;20:S120–S131. doi: 10.1016/j.neuroimage.2003.09.014. [DOI] [PubMed] [Google Scholar]
- 54.Fischer MH, Zwaan RA. Embodied language: A review of the role of the motor system in language comprehension. Q. J. Exp. Psychol. 2008;61(6):825–850. doi: 10.1080/17470210701623605. [DOI] [PubMed] [Google Scholar]
- 55.Creem SH, Proffitt DR. Grasping objects by their handles: a necessary interaction between cognition and action. J. Exp. Psychol. Hum. Percept. Perform. 2001;27:218–28. doi: 10.1037//0096-1523.27.1.218. [DOI] [PubMed] [Google Scholar]
- 56.Pulvermüller F. Brain mechanisms linking language and action. Nat. Rev. Neurosci. 2005;6(7):576–582. doi: 10.1038/nrn1706. [DOI] [PubMed] [Google Scholar]
- 57.Wise RJ, et al. Noun imageability and the temporal lobes. Neuropsychologia. 2000;38:985–94. doi: 10.1016/s0028-3932(99)00152-9. [DOI] [PubMed] [Google Scholar]
- 58.Fiebach CJ, Friederici AD. Processing concrete words: fMRI evidence against a specific right-hemisphere involvement. Neuropsychologia. 2004;42:62–70. doi: 10.1016/s0028-3932(03)00145-3. [DOI] [PubMed] [Google Scholar]
- 59.Fridriksson J, et al. Motor speech perception modulates the cortical language areas. NeuroImage. 2008;41(2):605–13. doi: 10.1016/j.neuroimage.2008.02.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Roy AC, Craighero L, Fabbri-Destro M, Fadiga L. Phonological and lexical motor facilitation during speech listening: A transcranial magnetic stimulation study. J. Physiol. Paris. 2008;102:101–105. doi: 10.1016/j.jphysparis.2008.03.006. [DOI] [PubMed] [Google Scholar]
- 61.Miller GA. Review of Greenberg J. L. (ed) Universals of Language. Contemporary Psychology. 1963;8:417–18. [Google Scholar]
- 62.Beebe B, Alson D, Jaffe J, Feldstein S, Crown C. Vocal congruence in mother-infant play. J Psycholinguist Res. 1988;17:245–59. doi: 10.1007/BF01686358. [DOI] [PubMed] [Google Scholar]
- 63.Beattie G. Talk: An Analysis of Speech and Non-Verbal Behaviour in Conversation. Open University Press; Milton Keynes: 1983. [Google Scholar]
- 64.Condon WS, Ogston WD. A segmentation of behaviour. J. Psychiat. Res. 1967;5:221–35. [Google Scholar]
- 65.Chartrand TL, Bargh JA. The Chameleon effect: The perception-behavior link and social interaction. J. Personality & Social Psych. 1999;76:893–910. doi: 10.1037//0022-3514.76.6.893. [DOI] [PubMed] [Google Scholar]
- 66.Garrod S, Pickering MJ. Why is conversation so easy? Trends Cogn. Sci. 2004;8:8–11. doi: 10.1016/j.tics.2003.10.016. [DOI] [PubMed] [Google Scholar]
- 67.Pickering MJ, Garrod S. Do people use language production to make predictions during comprehension? Trends Cogn Sci. 2007;11:105–10. doi: 10.1016/j.tics.2006.12.002. [DOI] [PubMed] [Google Scholar]
- 68.McFarland DH. Respiratory markers of conversational interaction. J. Speech Lang. Hear. Res. 2001;44(1):128–43. doi: 10.1044/1092-4388(2001/012). [DOI] [PubMed] [Google Scholar]
- 69.Pardo JS. On phonetic convergence during conversational interaction. J. Acoust. Soc. Am. 2006;119:2382–93. doi: 10.1121/1.2178720. [DOI] [PubMed] [Google Scholar]
- 70.Sacks H, Schegloff EA, Jefferson GA. A simplest systematics for the organization of turn-taking in conversation. Language. 1974;50:697–735. [Google Scholar]
- 71.De Ruiter JP, Mitterer H, Enfield NJ. Projecting the end of a speaker’s turn: A cognitive cornerstone of conversation. Language. 2006;82:515–535. [Google Scholar]
- 72.Beattie GW, Barnard PJ. The temporal structure of natural telephone conversations (Directory Inquiry calls) Linguistics. 1979;17(3-4):213–229. [Google Scholar]
- 73.Wilson M, Wilson TP. An oscillator model of the timing of turn-taking. Psychon. Bull. Rev. 2005;12(6):957–968. doi: 10.3758/bf03206432. [DOI] [PubMed] [Google Scholar]
- 74.Nobuhiko K, Kenzo I. Pure Delay Effects on Speech Quality in Telecommunications. IEEE Journal On Selected Areas In Communications. 1991;9(4):586–593. [Google Scholar]
- 75.Iacoboni M. Understanding Others: Imitation, Language, Empathy. In: Susan Hurley, Nick Chater., editors. Perspectives on Imitation: From Neuroscience to Social Science. Vol. 2. MIT Press; 2005. pp. 77–100. [Google Scholar]
- 76.Cummins F. Practice and performance in speech produced synchronously. J. Phonetics. 2003;31:139–148. [Google Scholar]
- 77.Cummins F. Rhythm as entrainment: The case of synchronous speech. Journal of Phonetics. 2009;37:16–28. [Google Scholar]
- 78.Prinz W. What re-enactment earns us. Cortex. 2006;42(4):515–517. doi: 10.1016/s0010-9452(08)70389-7. [DOI] [PubMed] [Google Scholar]
- 79.Schienberg S, Holland AL. Conversational turn-taking in Wernicke’s aphasia. In: Brookshire RH, editor. Clinical Aphasiology: Conference Proceedings. BRK Publishers; Minneapolis, MN: 1980. pp. 106–110. [Google Scholar]
- 80.Warren JE, et al. Positive emotions preferentially engage an auditory-motor “mirror” system. J. Neurosci. 2006;26(50):13067–13075. doi: 10.1523/JNEUROSCI.3907-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Scott SK. The point of P-centres. Psychol Res. 1998;61:4–11. [Google Scholar]
- 82.Warren WH, Jr, Verbrugge RR. Auditory perception of breaking and bouncing events: a case study in ecological acoustics. J. Exp. Psychol. Hum. Percept. Perform. 1984;10:704–12. doi: 10.1037//0096-1523.10.5.704. [DOI] [PubMed] [Google Scholar]
- 83.Hove MJ, Keller PE, Krumhansl CL. Sensorimotor synchronization with chords containing tone-onset asynchronies. Percept Psychophys. 2007;69:699–708. doi: 10.3758/bf03193772. [DOI] [PubMed] [Google Scholar]
- 84.Gordon JW. The perceptual attack time of musical tones. J. Acoust. Soc. Am. 1987;82:88–105. doi: 10.1121/1.395441. [DOI] [PubMed] [Google Scholar]
- 85.Rasch R. Synchronization in performed ensemble music. Acustica. 1979;43:121–131. [Google Scholar]
- 86.Marcus SM. Acoustic determinants of perceptual centre (P-center) location. Perception and Psychophysics. 1981;30:247–256. doi: 10.3758/bf03214280. [DOI] [PubMed] [Google Scholar]
- 87.Kohler E, et al. Hearing sounds, understanding actions: action representation in mirror neurons. Science. 2002;297(5582):846–848. doi: 10.1126/science.1070311. [DOI] [PubMed] [Google Scholar]
- 88.Gazzola V, Aziz-Zadeh L, Keysers C. Empathy and the somatotopic auditory mirror system in humans. Curr. Biol. 2006;16(18):1824–1829. doi: 10.1016/j.cub.2006.07.072. [DOI] [PubMed] [Google Scholar]
- 89.Lahav A, Saltzman E, Schlaug G. Action representation of sound: audiomotor recognition network while listening to newly acquired actions. J. Neurosci. 2007;27(2):308–14. doi: 10.1523/JNEUROSCI.4822-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Meyer M, Zysset S, von Cramon DY, Alter K. Distinct fMRI responses to laughter, speech, and sounds along the human peri-sylvian cortex. Brain Res Cogn Brain Res. 2005;24(2):291–306. doi: 10.1016/j.cogbrainres.2005.02.008. [DOI] [PubMed] [Google Scholar]
- 91.Provine RR. Contagious laughter - laughter is a sufficient stimulus for laughs and smiles. Bull. Psychonomic Soc. 1992;30:1–4. [Google Scholar]
- 92.Wiltermuth SS, Heath C. Synchrony and Cooperation. Psychological Science. 2008;20:1–5. doi: 10.1111/j.1467-9280.2008.02253.x. [DOI] [PubMed] [Google Scholar]
- 93.Rauschecker JP, Tian B. Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proc. Natl. Acad. Sci. U. S. A. 2000;97:11800–6. doi: 10.1073/pnas.97.22.11800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Binder JR, Swanson SJ, Hammeke TA, Sabsevitz DS. A comparison of five fMRI protocols for mapping speech comprehension systems Epilepsia. 2008;49(12):1980–1997. doi: 10.1111/j.1528-1167.2008.01683.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Wilson SM, Molnar-Szakacs I, Iacoboni M. Beyond superior temporal cortex: intersubject correlations in narrative speech comprehension. Cereb Cortex. 2008;18(1):230–42. doi: 10.1093/cercor/bhm049. [DOI] [PubMed] [Google Scholar]
- 96.Scott SK, Rosen S, Wickham L, Wise RJS. A positron emission tomography study of the neural basis of informational and energetic masking effects in speech perception. JASA. 2004;115(2):813–821. doi: 10.1121/1.1639336. [DOI] [PubMed] [Google Scholar]
- 97.Callan DE, et al. Song and speech: Brain regions involved with perception and covert production. NeuroImage. 2006;31(3):1327–1342. doi: 10.1016/j.neuroimage.2006.01.036. [DOI] [PubMed] [Google Scholar]
- 98.Doehrmann O, Naumer MJ, Volz S, Kaiser J, Altmann CF. Probing category selectivity for environmental sounds in the human auditory brain. Neuropsychologia. 2008;46(11):2776–2786. doi: 10.1016/j.neuropsychologia.2008.05.011. [DOI] [PubMed] [Google Scholar]
- 99.Lewis JW, Brefczynski JA, Phinney RE, Janik JJ, DeYoe EA. Distinct cortical pathways for processing tool versus animal sounds. Journal of Neuroscience. 2005;25(21):5148–5158. doi: 10.1523/JNEUROSCI.0419-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Bangert M, et al. Shared networks for auditory and motor processing in professional pianists: Evidence from fMRI conjunction. NeuroImage. 2006;30(3):917–926. doi: 10.1016/j.neuroimage.2005.10.044. [DOI] [PubMed] [Google Scholar]