Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2011 Nov 23;108(51):E1441–E1450. doi: 10.1073/pnas.1115267108

Long-term music training tunes how the brain temporally binds signals from multiple senses

HweeLing Lee 1,1, Uta Noppeney 1
PMCID: PMC3251069  PMID: 22114191

Abstract

Practicing a musical instrument is a rich multisensory experience involving the integration of visual, auditory, and tactile inputs with motor responses. This combined psychophysics–fMRI study used the musician's brain to investigate how sensory-motor experience molds temporal binding of auditory and visual signals. Behaviorally, musicians exhibited a narrower temporal integration window than nonmusicians for music but not for speech. At the neural level, musicians showed increased audiovisual asynchrony responses and effective connectivity selectively for music in a superior temporal sulcus-premotor-cerebellar circuitry. Critically, the premotor asynchrony effects predicted musicians’ perceptual sensitivity to audiovisual asynchrony. Our results suggest that piano practicing fine tunes an internal forward model mapping from action plans of piano playing onto visible finger movements and sounds. This internal forward model furnishes more precise estimates of the relative audiovisual timings and hence, stronger prediction error signals specifically for asynchronous music in a premotor-cerebellar circuitry. Our findings show intimate links between action production and audiovisual temporal binding in perception.

Keywords: audiovisual synchrony, multisensory integration, sensorimotor learning, crossmodal integration, experience-dependent plasticity


Practicing a musical instrument is a rich multisensory experience involving the integration of visual, auditory, and tactile inputs with motor responses. The musician's brain, thus, provides an ideal model to study experience-dependent plasticity in humans (1, 2).

Previous research in musicians has focused on neural plasticity affecting unisensory and motor processing. Little is known about how musical expertise alters the integration of inputs from multiple senses. Because musical performance requires precise timing, musical expertise may specifically modulate the temporal binding of sensory signals. Given the variability in physical and neural transmission times, sensory signals do not have to be precisely synchronous but must co-occur within a temporal window that flexibly adapts to the temporal statistics of the sensory inputs as a consequence of music (3) or audiovisual training (4). At the neural level, audiovisual (a)synchrony processing relies on a widespread neural system encompassing subcortical, primary sensory, higher-order association, cerebellar, and premotor areas (58).

This study used the musician's brain as a model to investigate how long-term sensory-motor experience (i.e., piano practicing) shapes the neural processes underlying temporal binding of auditory and visual signals. We presented subjects with synchronous and asynchronous speech and piano music as two stimulus classes that are both characterized by a rich hierarchical temporal structure but linked to different motor effectors (mouth vs. hand). Comparing the effect of musical expertise on synchrony perception of speech and music allowed us to dissociate generic and context-specific neural mechanisms by which piano practicing fine tunes audiovisual temporal binding and synchrony perception.

Generic mechanisms of musical expertise may rely on experience-driven plasticity affecting sensory and particularly, auditory processing. Brainstem responses in musicians relative to nonmusicians have recently been shown to be faster, larger, and more reliable when encoding the periodicity of speech and music (9, 10). Importantly, the enhanced auditory processing skills transferred from music to speech (11). Hence, if music training induces a general sensitization to audiovisual temporal (mis)alignment, we would expect a narrower temporal integration window and increased neural audiovisual (a)synchrony effects along the auditory processing hierarchy in musicians for both music and speech.

Context-specific mechanisms of musical expertise may rely on the formation of internal forward models that are fine tuned to specific motor tasks and effectors. Internal forward models have been invoked as a mechanism for not only motor control but also, motor and perceptual timing in the unisensory domains (1214). They are learned by error feedback in interactions with the environment and thought to be instantiated in a cortico-cerebellar circuitry. Specifically, piano practicing may fine tune an internal forward model that maps from the motor plan of piano playing onto its sensory consequences (i.e., the visible finger movements and concurrent auditory sounds). Because piano playing generates sensory signals in multiple modalities, the internal forward model indirectly also furnishes predictions about the relative timings of the auditory and visual signals leading to a narrower temporal binding window. We would, therefore, expect audiovisual asynchronous stimuli that violate the model's temporal predictions to elicit an error signal within this cortico-cerebellar circuitry selectively for music and not for speech that relies on different motor effectors and sensory-motor transformations (1215).

Results

Eighteen musicians and nineteen nonmusicians participated in the psychophysics study [before functional MRI (fMRI)] and the fMRI study. During the psychophysics study, subjects explicitly judged the audiovisual synchrony of speech sentences and piano music at 13 levels of audiovisual stimulus onset asynchronies (AV-SOAs; ±360, ±300, ±240, ±180, ±120, ±60, and 0 ms). From the proportion of synchronous responses, we estimated the temporal integration window for each subject. In the fMRI study (2–8 wk later), subjects were presented with the same set of speech and music material. The stimuli were presented synchronously and asynchronously selectively with a temporal offset of ±240 ms (Fig. S1). This AV-SOA level was associated with an average proportion of synchronous responses of 33.6% across subjects and stimulus classes in our psychophysics study (Fig. 1). During the fMRI study, subjects passively perceived the audiovisual stimuli to evaluate automatic (a)synchrony effects in motor, premotor, and prefrontal regions unconfounded by motor responses and task-induced processes (e.g., response selection demands, etc.). Inside the scanner, subjects’ performance (e.g., fixation) was monitored using eye tracking to ensure that they attended to the visual and auditory stimulus components (SI Results, Eye Movement Monitoring).

Fig. 1.

Fig. 1.

The psychometric functions for speech (Left) and music (Right) in musicians (black, M+) and nonmusicians (gray, M−; from the psychophysical experiment before the fMRI study).

Psychophysics Experiment.

For each subject, psychometric functions were estimated separately for speech and music from the proportion of synchronous responses at each AV-SOA level [SI Experimental Procedures, Behavioral Analysis: Psychophysics Study (Before fMRI Study)]. The audiovisual temporal integration window was defined as the integral of the fitted psychometric function bounded by ±360 ms. As shown in Fig. 1, the temporal integration window was narrower for musicians than nonmusicians for music [musicians (mean ± SEM): 1.10 ± 0.07, nonmusicians: 1.54 ± 0.09] but not for speech (musicians: 1.37 ± 0.07, nonmusicians: 1.47 ± 0.06). Indeed, this impression was statistically validated in a mixed design ANOVA of the integral with stimulus class (music vs. speech) as within-subject factor and group (musicians vs. nonmusicians) as between-subject factor, showing a main effect of group [F(1,35) = 8.67, P < 0.01] and stimulus class [F(1.0,35.0) = 5.26, P < 0.05] and a group by stimulus class interaction [F(1.0,35.0) = 13.2, P = 0.001]. Posthoc testing confirmed that musicians exhibited a narrower temporal integration window than nonmusicians for music [t(35) = 3.92, P < 0.001] but not for speech [t(35) = 1.08, P = 0.14]. Furthermore, paired samples t tests comparing the temporal integration windows for speech and music in each group showed that musicians displayed a narrower temporal integration window for music relative to speech [t(18) = 4.29, P < 0.001], whereas the temporal integration windows for speech and music did not differ in nonmusicians [t(18) < 1, nonsignificant (n.s.)].

Collectively, piano practicing narrows the temporal integration window significantly only for music and not for speech. These results show that piano practicing employs a context-specific mechanism to fine tune audiovisual temporal binding and synchrony perception.

fMRI Experiment.

(A)synchrony system for music and speech.

We first identified candidate regions that are sensitive to the temporal (mis)alignment of the audiovisual signals. Asynchronous relative to synchronous conditions (pooled over musicians and nonmusicians) increased activation in a widespread neural system encompassing bilateral superior/middle temporal gyri/sulci, bilateral occipital and fusiform gyri, left premotor cortex, and bilateral cerebellar cortices. This asynchrony sensitive system was largely shared by music and speech (Fig. S2 and Tables S1 and S2). Indeed, the direct comparison of asynchrony effects for speech and music [i.e., the interaction between audiovisual (a)synchrony and stimulus class] did not reveal any asynchrony effects that were selective for either music or speech.

Effect of musical expertise on synchrony processing.

We then investigated whether and how musical expertise shapes the neural processes underlying audiovisual synchrony perception. Specifically, we expected piano practicing to mold audiovisual synchrony processing for music.

Separately for speech and music, we, therefore, identified asynchrony effects that are (i) common [i.e., (conjunction-null) conjunction analysis] and (ii) different [i.e., the interaction between audiovisual (a)synchrony and group] for musicians and nonmusicians.

For speech, musicians (M+) and nonmusicians (M−) showed common asynchrony effects in bilateral posterior superior temporal sulci/gyri (STS/STG) and left cerebellum, with a nonsignificant asynchrony effect in right cerebellum (x = +22, y = −74, z = −38, z score = 2.56) (Fig. 2 A and B and Table S3). However, no asynchrony effects were observed that differed for musicians and nonmusicians. These results suggest that piano practicing does not significantly affect (a)synchrony processing in speech perception.

Fig. 2.

Fig. 2.

Asynchrony effects for speech that are common in nonmusicians (M−) and musicians (M+). (A) Asynchrony effects for speech averaged across both groups (yellow) and common in both groups (green) are rendered on a template brain. (B Left) Asynchrony effects for speech that are common in both groups are displayed on a coronal slice of a normalized structural image (averaged across subjects). (B Center and Right) Fitted event-related BOLD responses (lines) and peristimulus time histogram (markers) at the given coordinate are displayed as a function of poststimulus time (PST; averaged over sessions and subjects). Insets show contrast estimates (across subjects’ mean ± SEM) of the asynchrony (async − sync) effect in arbitrary units (corresponding to percentage of whole-brain mean) for musicians (black, M+) and nonmusicians (gray, M−). Z scores pertain to the comparison between M+ and M−. A positive z score indicates that the asynchrony effect is greater in musicians relative to nonmusicians (and vice versa). (C Left) Neural asynchrony effects for speech that are significantly predicted by subjects’ perceptual asynchrony sensitivity. (C Right) Scatter plot depicting the regression of neural asynchrony effects for speech on their perceptual asynchrony sensitivity in musicians (black, M+) and nonmusicians (gray, M−).

For music, we identified common asynchrony effects for musicians (M+) and nonmusicians (M−) in left extrastriate cortex (Fig. 3 A and C and Table S3). Crucially, musicians (M+) relative to nonmusicians (M-) showed enhanced asynchrony effects in left superior precentral sulcus (anterior premotor cortex), right posterior STS/middle temporal gyrus (MTG), and left cerebellum (Fig. 3B and Table 1). As shown in the percent signal change plots, asynchrony effects for music in the right posterior STS/MTG emerged primarily for musicians (Fig. 3C). More specifically, the right posterior STS/MTG showed primarily a robust asynchrony effect in musicians (z score = 4.0) and a less reliable synchrony effect (z score = 2.4) in nonmusicians. In the neighboring voxel (x = +60, y = −40, z = −2), the synchrony effect for nonmusicians was negligible (z score = 1.6), whereas the asynchrony effect in musicians was even more robust (z score = 4.4).

Fig. 3.

Fig. 3.

Asynchrony effects for music that are common (A) and distinct (B) in nonmusicians (M−) and musicians (M+). (A) Asynchrony effects for music averaged across both groups (yellow) and common in both groups (green) are rendered on a template brain. (B) Asynchrony effects for music that are enhanced for musicians relative to nonmusicians are rendered on a template brain. (C Left) Asynchrony effects for music that are selective for musicians are displayed on a coronal slice of a normalized structural image (averaged across subjects). (C Center and Right) Fitted event-related BOLD responses (lines) and peristimulus time histogram (markers) at the given coordinate are displayed as a function of poststimulus time (PST; averaged over sessions and subjects). Insets show contrast estimates (across subjects’ mean ± SEM) of the asynchrony (async − sync) effect in arbitrary units (corresponding to percent of whole-brain mean) for musicians (black, M+) and nonmusicians (gray, M−). Z scores pertain to the comparison between M+ and M−. A positive z score indicates that the asynchrony effect is greater in musicians relative to nonmusicians (and vice versa).

Table 1.

Asynchrony effects that are modulated by musical expertise [i.e., the interaction between audiovisual (a)synchrony and musical expertise]

MNI coordinates
Brain region Cluster size x y z z Score (peak) P value
Asynchrony effects for speech that are enhanced in musicians
 M+ > M− for async > sync for speech NIL
Asynchrony effects for music that are enhanced in musicians
 M+ > M− for async > sync for music
  R. posterior STS/middle temporal gyrus 73 62 −40 −4 4.5 0.03
  L. superior precentral sulcus/L. premotor 81 −42 20 50 4.4 0.04
  L. cerebellum (Crus II/VIIb) 182 −32 −60 −38 4.3

P value, Corrected at peak level for multiple comparisons within the search volume of interest (see Experimental Procedures, Search Volume Constraints).

By contrast, in the case of speech, the right posterior STS/MTG showed asynchrony effects for both musicians and nonmusicians. This activation profile highlights the role of prior sensory-motor experience (available for speech in both groups but for music only in musicians) in tuning the neural systems involved in automatic audiovisual asynchrony detection.

A similar profile, as in the right posterior STS/MTG, was also observed in the left anterior premotor cortex. In the anterior premotor cortex (left superior precentral sulcus), asynchrony effects for music were strongly modulated by musical expertise and amplified for musicians. Surprisingly, the left anterior premotor cortex exhibits a cross-over interaction. In other words, the most anterior premotor cortex shows activation increases for asynchronous stimuli in musicians but synchronous stimuli in nonmusicians. Additional exploration of the activation in musicians and nonmusicians revealed that (i) the asynchrony effects in musicians were located in the posterior and anterior portions of premotor cortex, but (ii) the synchrony effects in nonmusicians extended from the superior frontal gyrus as part of the deactivation network. Because the synchrony effects in the superior frontal gyrus were not significant given our statistical criteria, they are not further discussed.

Collectively, our fMRI and behavioral results indicate that piano practicing shapes automatic audiovisual temporal binding by a context-specific neural mechanism selectively for music and not for speech. Indeed, a three-way interaction confirmed that the modulation of the asynchrony effects by musical expertise was greater for music than speech (at P < 0.001, uncorrected) in left superior precentral sulcus (x = −42, y = +20, z = +50, z score = 4.2), cerebellum (x = −32, y = −64, z = −38, z score = 4.0), and right posterior STS/MTG (x = +62, y = −40, z = −4, z score = 3.7).

Relationship between subject-specific perceptual and neural asynchrony effects.

We investigated whether subjects’ perceptual sensitivity to audiovisual asynchrony as measured in the psychophysical study predicted their individual asynchrony-induced activation enhancement separately for speech and music (i.e., the contrast estimate pertaining to asynchronous − synchronous conditions for speech or music; referred to as neural asynchrony effect). To relate perceptual and neural effects more closely, we determined each subject's perceptual asynchrony sensitivity as the difference in proportion of synchronous responses for synchronous − asynchronous conditions separately for speech and music. The regression analysis was constrained to all voxels showing an asynchrony effect for speech [respective (resp.) for music) at P < 0.001, uncorrected (Experimental Procedures, Search Volume Constraints).

For speech, the perceptual asynchrony sensitivity significantly predicted subjects’ neural asynchrony effects in left cerebellum in both musicians and nonmusicians (Table 2). In other words, the more accurately that subjects discriminated between synchronous and asynchronous conditions in the psychophysical study, the stronger was their asynchrony effects in left cerebellum during speech perception (Fig. 2C).

Table 2.

Regression analyses of neural asynchrony effects on perceptual asynchrony sensitivity for speech and music

MNI coordinates
Brain region Cluster size x y z z Score (peak) P value
Neural asynchrony effects for speech that are significantly predicted by musicians’ + nonmusicians’ perceptual asynchrony sensitivity for speech
 L. cerebellum (Crus II/VIIb) 25 −16 −76 −40 3.7 0.008
 L. cerebellum (Crus II/VIIb) −18 −74 −34 3.7 0.008
Neural asynchrony effects for music that are significantly predicted by musicians’ + nonmusicians’ perceptual asynchrony sensitivity for music NIL
Neural asynchrony effects for music that are significantly more predicted by musicians’ than nonmusicians’ perceptual asynchrony sensitivity for music
 L. superior precentral sulcus/L. premotor 1 −50 16 44 3.2 0.058
 L. superior precentral sulcus/L. premotor 2 −48 18 46 3.2 0.059

P value, Corrected at peak level for multiple comparisons within the search volume of interest (see Experimental Procedures, Search Volume Constraints).

For music, the perceptual asynchrony sensitivity significantly predicted the neural asynchrony effects in the left premotor cortex only in musicians but not in nonmusicians (more specifically, we observed a significant interaction; i.e., a change in regression slopes) (Fig. 4E and Table 2). Hence, this analysis provided additional evidence that music training influences audiovisual synchrony perception by a context-specific mechanism that does not generalize to speech.

Fig. 4.

Fig. 4.

Cortical hierarchy of audiovisual asynchrony effects from posterior to anterior areas in the left frontal cortex. (A) Stimulus-evoked activations for speech and music (blue, M+ + M− asynchronous + synchronous > fixation for speech + music), asynchrony effects for speech and music (orange, M+ + M− asynchronous > synchronous for speech + music), and asynchrony effects that are enhanced for musicians relative to nonmusicians (red, M+ > M− asynchronous > synchronous for music) are rendered on a template flattened brain. (BD) Fitted event-related BOLD responses (lines) and peristimulus time histogram (markers) at the given coordinate are displayed as a function of poststimulus time (PST; averaged over sessions and subjects). Insets show contrast estimates (across subjects’ mean ± SEM) of the asynchrony (async − sync) effect in arbitrary units (corresponding to percent of whole-brain mean) for musicians (black, M+) and nonmusicians (gray, M−). Z scores pertain to the comparison between M+ and M−. A positive z score indicates that the asynchrony effect is greater in musicians relative to nonmusicians (and vice versa). (E) Scatter plot depicting the regression of the neural asynchrony effects for music on perceptual asynchrony sensitivity in musicians (black, M+) and nonmusicians (gray, M−).

Interestingly, for both speech and music, the asynchrony effects that were predicted by subjects’ perceptual asynchrony sensitivity were found in left cerebellum and premotor cortex (i.e., two brain areas that are traditionally associated with higher-order motor processing like motor planning and sequencing and less so with sensory processing) (16). However, perhaps surprisingly, the asynchrony effects were consistently more pronounced in the left cerebellar hemisphere (i.e., ipsilateral to the premotor activation), although they were observed bilaterally at a lower statistical threshold of significance. This response profile might be explained by the specific role of the left cerebellum in temporal processing within the millisecond range, which is particularly relevant for audiovisual asynchrony detection in the current paradigm (17, 18).

Temporal processing along the motor hierarchy.

To further investigate the role of the motor system in audiovisual temporal processing, we characterized the anatomical relation of the following three effects in motor areas: (i) activation induced by processing audiovisual speech or music action sequences relative to fixation (i.e., music + speech in M+ + M− > fixation), (ii) asynchrony effects pooled over stimulus class and group (i.e., asynchronous > synchronous for music + speech in M+ + M−), and (iii) the asynchrony effects for music that are modulated by musical expertise (i.e., asynchronous > synchronous for music for M+ > M−). Collectively, these three effects identified a processing hierarchy within the motor system.

In fact, the asynchrony effects emerged progressively along the motor cortical hierarchy when moving from left posterior primary to left anterior premotor cortices (Fig. 4). Thus, the primary motor cortex was activated generally by music and speech but did not show a significant asynchrony effect for any stimulus class (Fig. 4B) (M+ + M− for music + speech > fixation; x = −28, y = −22, z = +64, z score = 5.3, P < 0.05, corrected for the entire brain). The posterior premotor cortex was sensitive to audiovisual asynchrony and showed increased activation for asynchronous relative to synchronous conditions averaged across groups and stimulus classes (Fig. 4C) (asynchronous > synchronous for music + speech in M+ + M−; x = −34, y = 0, z = +64, z score = 4.8, P < 0.05, corrected for the entire brain). Finally, in the anterior premotor cortex (left superior precentral sulcus), the asynchrony effects for music were strongly modulated by musical expertise and amplified for musicians (Fig. 4D) (asynchronous > synchronous for music for M+ > M− as reported above and in Table 2).

Dynamic causal modelling results.

Fig. 5B shows the exceedance probabilities of each model in our factorial 6 × 6 model space. In nonmusicians, model 22 is the winning model (P = 0.14) followed by model 23 (P = 0.10). In musicians, model 23 is the winning model (P = 0.41) followed by model 29 (P = 0.11). Family-level inference confirmed these results. In nonmusicians, the highest exceedance probabilities were assigned to the model families with (i) speech asynchrony affecting the cerebellum → STS connection (P = 0.65) and (ii) music asynchrony affecting the cerebellum → STS connection (P = 0.61). In musicians, the highest exceedance probabilities were assigned to the model families with (i) speech asynchrony affecting the cerebellum → STS connection (P = 0.84) and (ii) music asynchrony affecting the premotor → STS connection (P = 0. 94).

Fig. 5.

Fig. 5.

(A) Basic DCM. From this basic DCM, 36 candidate DCMs were generated by factorially manipulating the connection that was modulated by music asynchrony or speech asynchrony. (B) Bayesian model comparison—random effects analysis for (i) musicians and (ii) nonmusicians. The matrix shows the exceedance probability of the 36 DCMs in a factorial fashion. The abscissa shows the effect of speech asynchrony. The ordinate shows the effect of music asynchrony. In nonmusicians, model 22 is associated with the highest exceedance probability; in musicians, model 23 is associated with the highest exceedance probability. (C) The strengths (mean ± SEM; nonbold numbers) of the intrinsic, extrinsic, and modulatory connections for the averaged model and their posterior probability of being different from zero (bold numbers) in the (i) nonmusicians and (ii) musicians group.

Because the winning models in nonmusicians and musicians were different, we averaged across the two top models (i.e., M22 and M23) separately in nonmusicians and musicians. Fig. 5C shows the strengths (nonbold numbers) of the intrinsic, extrinsic, and modulatory connections for the averaged model in nonmusicians and musicians and their posterior probability of being different from zero (bold numbers). The nonbold numbers by the modulatory effects index the change in coupling (i.e., responsiveness of the target region to activity in the source region) induced by asynchronous music or speech at the group level. Asynchronous speech enhanced the strength of the cerebellum → STS connection similarly in both musicians and nonmusicians. In contrast, asynchronous music increased the strength of the premotor → STS connection selectively in musicians. Indeed, the modulatory effect of asynchronous music on the premotor → STS connection was significantly stronger in musicians than nonmusicians (P = 1.0). Thus, asynchronous music and speech increased the connection strengths from premotor and cerebellum to STS. Specifically, both connections were inhibitory when subjects were presented with synchronous music or speech signals (i.e., premotor and cerebellum suppress activation in STS). However, when subjects were presented with audiovisual asynchronous signals, the connections became excitatory and induced a prediction error signal within STS, propagating throughout the STS-premotor-cerebellar circuitry.

Most importantly, musical expertise changed the dynamics in this circuitry; it increased the modulatory effect of music asynchrony on the premotor → STS connection, thus turning an inhibitory pathway for synchronous music into an excitatory pathway for asynchronous music in musicians.

Discussion

Our psychophysical and fMRI data show that piano practicing molds audiovisual temporal binding and synchrony perception by a context-specific neural mechanism. Behaviorally, musicians relative to nonmusicians exhibited a significantly narrower temporal integration window for music but not for speech. At the neural level, musicians showed increased audiovisual asynchrony effects and effective connectivity for music in an STS-premotor-cerebellar circuitry. Collectively, these results suggest that piano practicing provides more precise estimates of the relative audiovisual timings in music by fine tuning an internal forward model that maps from action plans of piano playing onto visible finger movements and concurrent piano sounds.

Accumulating evidence suggests that long-term music training changes auditory processing throughout the cortical hierarchy and produces behavioral benefits beyond music performance (11, 19) (e.g., most prominently in speech processing) (10). In contrast to these generic auditory processing benefits, musical expertise in our study sensitized to the audiovisual temporal relationship selectively for music with no significant transfer to speech processing (3).

At the neural level, asynchronous relative to synchronous conditions increased activation in a distributed neural system encompassing bilateral STS/MTG, occipital and fusiform gyri, and premotor and cerebellar cortices (6, 7). This asynchrony system was largely shared by speech and music, with no asynchrony effects that were specific to speech or music. Despite this common neural asynchrony system, piano practicing significantly modulated the neural asynchrony effects specifically for music but not for speech, thus replicating the context specificity already indicated at the behavioral level. Although speech elicited comparable asynchrony effects in musicians and nonmusicians in posterior STS/MTG bilaterally and left cerebellum, music evoked increased asynchrony effects for musicians relative to nonmusicians in left premotor cortex, left cerebellum, and right posterior STS/MTG. Hence, audiovisual asynchrony is detected automatically not only along the sensory processing hierarchies and classical audiovisual integration areas such as STS (7, 20, 21) but also in a premotor-cerebellar circuitry. Importantly, the asynchrony effects within the premotor-cerebellar circuitry depended on the availability of prior sensory-motor experience. In line with humans’ generic speech expertise, the asynchrony effects were observed in left premotor cortex and cerebellum for speech in both groups but for music primarily in musicians that were endowed with the relevant motor repertoire of piano playing.

Natural connected speech and piano music are characterized by a rich hierarchical temporal structure and linked to the motor system by different effectors. Although it is well-established that even passive speech and music perception implicitly activate parts of the motor system (22, 23), our results reveal a more fine-grained cortical hierarchy within the motor system. (i) The primary motor cortex passively coactivated for music and speech actions, irrespective of the temporal relationship of the audiovisual signals. (ii) The premotor cortex was sensitive to audiovisual asynchrony and activated more for asynchronous relative to synchronous conditions (Fig. 4). (iii) Critically, in the anterior premotor cortex, audiovisual asynchrony responses for music were modulated by subjects’ prior sensory-motor experience of piano playing. Furthermore, in musicians’ only, they were also modulated by their perceptual asynchrony sensitivity for music.

Collectively, these results suggest that sensory-motor experience enables the engagement of a premotor-cerebellar circuitry as a supplementary mechanism to determine the temporal (mis)alignment of auditory and visual signals. Previous neurophysiological functional imaging and lesion studies have implicated the cerebellum and premotor cortex in the perception (22, 24, 25) and production (e.g., tapping in synchrony with musical rhythms) (26, 27) of musical and in particular, rhythmic sequences. Activation in the dorsal premotor cortex was modulated by the metric structure of the auditory stimulus (27), subjects’ cognitive set (e.g., motor imagery) (28), and their musical expertise (28, 29). Furthermore, cerebellum and premotor cortex were involved in motor and perceptual timing, particularly at the millisecond range (12, 3035). Patients with cerebellar lesions showed increased variability on temporal production tasks such as rhythmic tapping (36). Temporal relative to spatial prediction tasks increased activation in the posterior cerebellum (12).

Computationally, the cerebellum is thought to instantiate a forward model that maps from motor (and even cognitive) plans onto their sensory consequences learned by error feedback from real life sensory-motor experience (13, 14, 37). Because speech production and piano playing induce concurrent visible facial movements and auditory outputs, the internal forward model indirectly furnishes predictions about the relative timings of auditory outputs and visual movements. Hence, asynchronous music (in musicians) and speech (in both groups) that violate these temporal predictions induce a prediction error signal in this premotor-cerebellar circuitry (14, 15). Thus, the forward model instantiates a supplementary motor-based mechanism to enable more precise audiovisual temporal estimates by simulating actions and their effects. From a more cognitive perspective, this motor-based mechanism may manifest itself in the superior skills for action imagery, simulation, and imitation in musicians (related studies of action observation and/or imagery in pianists in refs. 28, 29, and 38).

The functional relevance of left cerebellum and premotor cortex is also supported by the relationship of subjects’ perceptual sensitivity to audiovisual asynchrony (as measured outside the scanner) and their neural asynchrony responses. For speech, the cerebellar asynchrony effects were significantly predicted by the perceptual sensitivity to audiovisual asynchrony in both nonmusicians and musicians. For music, the anterior premotor asynchrony effects were significantly predicted by the perceptual sensitivity to audiovisual asynchrony in musicians only. These results cannot be explained by explicit motor responses, because subjects were engaged in passive speech and music perception inside the scanner. Also, they cannot be explained by eye movement artifacts, because subjects were fixating and eye movements (as measured during fMRI) were not significantly different between synchronous and asynchronous trials (SI Results, Eye Movement Monitoring). Instead, our results show that left cerebellum and premotor cortex are behaviorally relevant for implicit automatic evaluation of the temporal relationship between auditory and visual signals. By fine tuning an internal forward model stored within a premotor-cerebellar circuitry, sensory-motor experience can, thus, influence which sensory inputs are considered synchronous and integrated into a coherent percept.

The role of the cerebellum and premotor cortex in evaluating the temporal alignment of auditory and visual signals is further supported by our results obtained by combining dynamic causal modeling (DCM) with Bayesian model comparison and Averaging. In the averaged optimal DCM, asynchronous speech modulates the connection from cerebellum → STS, whereas asynchronous music modulates primarily the connection from premotor cortex → STS. Both asynchronous music and speech enhance the effective connectivity to STS, turning inhibitory (i.e., negative) connections for audiovisual synchronous signals into excitatory (i.e., positive) connections for asynchronous signals. These findings suggest that the premotor cortex and cerebellum influence audiovisual temporal binding in STS by generating a prediction error signal by increased connectivity to STS. Importantly, comparing the modulatory effects of asynchronous speech and music across groups shows that the effective connectivity is also altered by musical expertise in a context-specific fashion. Although the modulatory effect of asynchronous speech is comparable across groups, the modulatory effect of asynchronous music is selectively enhanced for musicians relative to nonmusicians.

The specificity of the plastic changes for music in terms of (i) behavioral audiovisual temporal integration window, (ii) audiovisual asynchrony blood oxygenation level-dependent (BOLD) responses, and (iii) effective connectivity strongly suggests sensorimotor rather than pure auditory learning mechanisms. However, future studies that formally compare the effect of pure audiovisual vs. audiovisual-motor training schemes on audiovisual synchrony perception are needed to further dissociate the contributions of sensory-motor from pure audiovisual (i.e., sensory) learning effects. In the unisensory domains, the role of audio-motor experience has recently been highlighted in a magnetoencephalography (MEG) study showing a larger mismatch negativity for deviant musical structures after audio-motor (i.e., piano practicing) than pure auditory learning (39). Conversely, studies comparing motor skills that are and are not associated with sounds (e.g., pianists vs. athletes) are needed to confirm that motor skills per se do not afford higher sensitivity to the temporal misalignment of auditory and visual signals. Finally, although in the current study, the benefit of musical expertise did not significantly generalize from music to speech, it is an open question whether more intensive training (e.g., professional pianists) will indeed induce generalization to some extent.

In conclusion, our behavioral and neural data jointly provide strong evidence for a context-specific mechanism, where piano practicing affords an internal forward model that enables more precise predictions of the relative timings of the auditory and visual signals. Asynchronous speech and music stimuli that violate the predictions from this internal forward model elicit an error signal in an STS-premotor-cerebellar circuitry that is fine tuned by sensory-motor experience. Collectively, our findings highlight intimate links between sensory-motor experience and audiovisual synchrony perception, where our interactions with the environment determine whether and how we integrate auditory and visual inputs into a unified percept.

Experimental Procedures

Subjects.

Thirty-seven (18 musicians and 19 nonmusicians) healthy German native speakers participated in the fMRI and psychophysics study after giving informed consent. The musicians (M+) were amateur pianists that were selected based on the following criteria: (i) started piano practicing before the age of 12 y (mean ± SD = 7.9 ± 1.9 y), (ii) practicing piano for at least 6 y (mean ± SD = 16.4 ± 5.6 y), and (iii) practicing piano for at least 1 h (mean ± SD = 3.33 ± 1.69 h) per week over the past 3 y. The nonmusicians (M−) were selected based on having no piano practicing and less than 3 mo of any musical training (16 subjects had no musical experience).

A detailed description is in SI Experimental Procedures, Subjects.

Stimuli.

Stimulus material was taken from close-up audiovisual recordings of (i) a female actress’ face looking straight into the camera, uttering short sentences, or (ii) one male right hand playing short piano melodies on the keyboard. The piano melodies were generated to match the rhythm and number of syllables of the speech sentences.

A detailed description is in SI Experimental Procedures, Stimuli.

Experimental Design and Procedures.

Psychophysics study.

Between 2 and 8 wk before the fMRI study, subjects were presented with music and speech (and sine wave analogs of speech that are not included in this report) stimuli at 13 levels of audiovisual stimulus onset asynchronies (AV-SOA; ±360, ±300, ±240, ±180, ±120, ±60, and 0 ms). Each stimulus was presented four times at each AV-SOA, amounting to 1,248 presentations in total that were assigned to two sessions on separate days. The AV-SOA level and stimulus classes were randomized. In a two alternative, forced choice task, subjects judged each audiovisual stimulus as synchronous or asynchronous in a nonspeeded fashion.

fMRI study.

Subjects were presented with exactly the same stimulus material as in the psychophysical study. However, the AV-SOA levels were limited to synchronous (i.e., 0 AV offset) and asynchronous (±240 ms auditory and visual leading offset). The level of AV-SOA was determined to relate to an average of 33.6% judged as synchronous based on the psychophysical data (Fig. 2). Hence, the fMRI study used a 2 × 3 × 2 factorial design with the within-subject factors [(i) audiovisual (a)synchrony (synchronous vs. asynchronous) and (ii) stimulus class (music vs. speech vs. sinewave speech analogs (SWS))] and the between-subjects factor [group (musicians vs. nonmusicians)].

Each stimulus was presented 24 times, amounting to 576 trials in total. The stimuli were presented in blocks of eight trials (stimulus onset asynchrony = 5.6 s) interleaved with 8-s fixation. Audiovisual (a)synchrony was pseudorandomized in an event-related fashion, and the stimulus class was manipulated across blocks.

To enable characterization of asynchrony effects in the motor and other neural systems unconfounded by motor responses and task-induced processes, subjects passively perceived the audiovisual stimuli, with their performance being monitored by eye tracking recordings. Indeed, eye tracking recordings (inside the scanner) confirmed that subjects fixated the visual stimuli equally in all conditions and both groups (SI Results, Eye Movement Monitoring).

Experimental Setup and Stimulus Presentation.

A detailed description is in SI Experimental Procedures, Experimental Setup and Stimulus Presentation.

Behavioral Analysis: Psychophysics Study (Before fMRI Study).

To refrain from making any distributional assumptions, the psychometric function was estimated from the proportion synchronous responses using local quadratic fitting as a nonparametric approach (40). The bandwidth for the local quadratic fitting was optimized individually for each subject in a cross-validation procedure. The audiovisual temporal integration window was estimated as the integral of the fitted psychometric function bounded by ±360 ms.

fMRI Data Acquisition and Analysis.

Structural and functional images were acquired with a Siemens Trio TIM 3T scanner (SI Experimental Procedures, fMRI Data Acquisition).

The data were analyzed with statistical parametric mapping (SPM8; Wellcome Center for Neuroimaging). Scans from each subject were realigned, unwarped, and spatially normalized into Montreal Neurological Institute (MNI) space using parameters from segmentation of the T1 structural image (41), resampled to 2 × 2 × 2 mm3 voxels and spatially smoothed with a Gaussian kernel of 8 mm full width half maximum (FWHM). The time series in each voxel was high-pass filtered to 1/128 Hz. The fMRI experiment was modeled in an event-related fashion, with regressors entered into the design matrix after convolving each event-related unit impulse (indexing stimulus onset) with a canonical hemodynamic response function and its first temporal derivative. In addition to modeling the nine conditions in our 3 × 3 factorial design (auditory and visual leading asynchronous trials were modeled separately), the statistical model included six realignment parameters as nuisance covariates to account for residual motion artifacts. Condition-specific effects for each subject were estimated according to the general linear model and passed to a second-level analysis as contrasts. This process involved creating four contrast images that compared (i) synchronous speech, (ii) asynchronous speech, (iii) synchronous music, and (iv) asynchronous music relative to fixation summed over the six sessions for each subject and entering them into a second-level ANOVA or regression models (see below). This second-level ANOVA modeled the eight conditions (i.e., four conditions for the M− and M+ groups each). Inferences were made at the second level to allow for a random effects analysis and inferences at the population level (42).

At the random effects level, we first tested for (i) the main effect of audiovisual (a)synchrony (synchronous > asynchronous for speech and music averaged across groups and vice versa) and (ii) interactions between audiovisual (a)synchrony and stimulus class. We then evaluated the effect of musical expertise separately for speech and music. Separately for each of the two stimulus classes, we tested for (i) the main effect of asynchrony, (ii) asynchrony effects that are common for musicians and nonmusicians (i.e., conjunction null conjunction analysis), and (iii) asynchrony effects that differ between musicians and nonmusicians [i.e., the interactions between audiovisual (a)synchrony and group].

Finally, we characterized the relationship between perceptual and neural asynchrony measures. For this characterization, the activation differences asynchronous − synchronous separately for music (resp. for speech) were entered into a multiple regression analysis that used subject's perceptual asynchrony sensitivity as measured by the proportion of synchronous responses for synchronous minus asynchronous conditions separately for musicians and nonmusicians for music (resp. for speech) as predictors for their corresponding neural asynchrony effects (i.e., activation difference asynchronous − synchronous for music). We then investigated whether the neural asynchrony effects were positively (and for completeness, also negatively) predicted by the perceptual asynchrony sensitivity (i) averaged across the two groups and (ii) differently for the two groups (i.e., the interaction between behavioral regression and group).

Search Volume Constraints.

The search space for the main and interaction effects of audiovisual (a)synchrony was limited to the cortical audiovisual processing system (speech + music > fixation; P < 0.05, whole-brain corrected, extent threshold > 100 voxels) combined with the entire cerebellum as our a priori search volume of interest (i.e., including 40,003 voxels).

To identify asynchrony effects that were common for musicians and nonmusicians, each asynchrony effect for one group was tested within a search volume mutually constrained by the other contrast. This approach is basically equivalent to a (conjunction-null) conjunction analysis (i.e., a logical AND).

For the regression analyses (i.e., neural asynchronous − synchronous differences against behavioral synchronous − asynchronous differences) for speech (resp. for music), we constrained the search space to voxels showing an asynchrony effect for speech (resp. for music) thresholded at P < 0.001, uncorrected and with an extent threshold > 20 voxels (speech = 277 voxels; music = 336 voxels). We applied this additional constraint to investigate whether responses in regions of the asynchrony system are significantly predicted by subjects’ individual perceptual sensitivity to audiovisual asynchrony (this constraint does not bias our inference, because the contrasts are orthogonal).

Unless otherwise stated, we report activations at P < 0.05 at the peak level corrected for multiple comparisons within the particular search volume.

For illustration purposes only (i.e., not for statistical inference), activations are displayed using a height threshold of P < 0.001, uncorrected and extent threshold > 20 voxels and are inclusively masked with the search mask as described above.

Effective Connectivity Analysis: DCM.

For each subject, 36 DCMs (43) were constructed. Each DCM included the three regions that showed greater asynchrony effects for music in musicians than nonmusicians: (i) the left anterior premotor cortex (x = −42, y = +20, z = +50), (ii) a right posterior STS (x = +62, y = −40, z = −4), and (iii) the left cerebellum (x= −32, y = −60, z = −36) (Fig. 4B). Given its prominent role in audiovisual integration, the right posterior STS was chosen as the audiovisual input region. The three regions were bidirectionally connected. The timings of the onsets were individually adjusted for each region to match the specific time of slice acquisition. All audiovisual speech and music stimuli entered as extrinsic inputs to posterior STS. Holding the number of parameters and the intrinsic and extrinsic connectivity structure constant, the 6 × 6 model space (36 DCMs) factorially manipulated the connection that was modulated by (i) music asynchrony and (ii) speech asynchrony (Fig. 5B).

Region-specific time series (concatenated over the six sessions and adjusted for effects of interest) comprised the first eigenvariate of all voxels within a 4-mm-radius sphere centered on the subject-specific peak in the asynchronous > synchronous for speech + music contrast. The subject-specific peak was uniquely identified as the positively valued maximum within the asynchronous > synchronous for speech + music contrast in a particular subject in a 10-mm-radius sphere centered on the peak coordinates from the group random effects analysis. For our input region posterior STS, we imposed the additional constraint that the effect for speech and music > fixation exceeded a t value of 1.65. In cases where we could not identify a maximum given these constraints, we selected the random effects maximum for this particular region and subject (this procedure was applied in five subjects for one region each).

Bayesian Model Comparison and Averaging.

To determine the most likely of the 36 DCMs given the observed data from all subjects, we applied Bayesian model selection separately for musicians and nonmusicians in a random effects group analysis to avoid distortions from outliers (44). Bayesian model selection was implemented in a hierarchical Bayesian model that estimates the frequencies with which models are used in each group. Gibbs sampling was used to estimate the posterior distribution over these frequencies (45). To characterize our Bayesian model selection results at the random effects level, we report the exceedance probability of one model being more likely than any other model tested.

Because Bayesian model selection of individual models can become brittle, when a large number of models is considered (45), we also used family-level inference. Given our factorial 6 × 6 model space, we compared the six model families that differ in the connection that is modulated by (i) music asynchrony or (ii) speech asynchrony. As reported in the results section, the optimal models differed for the musician and nonmusician groups. To enable inference and comparison of the connectivity parameters (given a particular model) across musicians and nonmusicians, we used Bayesian model averaging that computes an estimate of each model parameter (e.g., connection strength) by averaging the parameters across the models weighted by the posterior probability of each model. Using Bayesian model averaging at the group level, we obtained a sample-based representation of the posterior density for each intrinsic, extrinsic, or modulatory connection parameter. From this sample-based posterior density, we computed the posterior probability of a connection being greater than zero (equivalently smaller than zero if the connection strength is negative). For the modulatory connections, we also computed the posterior probability of a connection strength being increased (resp. for decreased) for the musician relative to nonmusician group.

Model comparison and statistical analysis of connectivity parameters of the optimal averaged model enable us to address the following two questions. First, we asked which connections in the STS-premotor-cerebellar circuitry are modulated by speech and music asynchrony to mediate the regional prediction error signals. Second, given the averaged optimal model, we asked whether any modulatory effects on connection strength differ across the two groups. In other words, we investigated whether musical expertise shapes the effective connectivity in a context-specific fashion. Specifically, we hypothesized that asynchronous music would enhance the connection strength more strongly in musicians than nonmusicians.

Supplementary Material

Supporting Information

Acknowledgments

We thank Kamila Zychaluk and David H. Foster for useful discussions and Fabian Sinz and Mario Kleiner for help with stimulus generation. This study is funded by Max Planck Society and part of the research program of the Bernstein Center for Computational Neuroscience, Tuebingen, funded by the German Federal Ministry of Education and Research (BMBF; FKZ: 01GQ1002).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. R.J.Z. is a guest editor invited by the Editorial Board.

See Author Summary on page 20295.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1115267108/-/DCSupplemental.

References

  • 1.Münte TF, Altenmüller E, Jäncke L. The musician's brain as a model of neuroplasticity. Nat Rev Neurosci. 2002;3:473–478. doi: 10.1038/nrn843. [DOI] [PubMed] [Google Scholar]
  • 2.Zatorre RJ, Chen JL, Penhune VB. When the brain plays music: Auditory-motor interactions in music perception and production. Nat Rev Neurosci. 2007;8:547–558. doi: 10.1038/nrn2152. [DOI] [PubMed] [Google Scholar]
  • 3.Petrini K, et al. Multisensory integration of drumming actions: Musical expertise affects perceived audiovisual asynchrony. Exp Brain Res. 2009;198:339–352. doi: 10.1007/s00221-009-1817-2. [DOI] [PubMed] [Google Scholar]
  • 4.Powers AR, 3rd, Hillock AR, Wallace MT. Perceptual training narrows the temporal window of multisensory binding. J Neurosci. 2009;29:12265–12274. doi: 10.1523/JNEUROSCI.3501-09.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bushara KO, et al. Neural correlates of cross-modal binding. Nat Neurosci. 2003;6:190–195. doi: 10.1038/nn993. [DOI] [PubMed] [Google Scholar]
  • 6.Miller LM, D'Esposito M. Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J Neurosci. 2005;25:5884–5893. doi: 10.1523/JNEUROSCI.0896-05.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Noesselt T, et al. Audiovisual temporal correspondence modulates human multisensory superior temporal sulcus plus primary sensory cortices. J Neurosci. 2007;27:11431–11441. doi: 10.1523/JNEUROSCI.2252-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lewis RK, Noppeney U. Audiovisual synchrony improves motion discrimination via enhanced connectivity between early visual and auditory areas. J Neurosci. 2010;30:12329–12339. doi: 10.1523/JNEUROSCI.5745-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Musacchia G, Sams M, Skoe E, Kraus N. Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proc Natl Acad Sci USA. 2007;104:15894–15898. doi: 10.1073/pnas.0701498104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wong PC, Skoe E, Russo NM, Dees T, Kraus N. Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat Neurosci. 2007;10:420–422. doi: 10.1038/nn1872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Kraus N, Chandrasekaran B. Music training for the development of auditory skills. Nat Rev Neurosci. 2010;11:599–605. doi: 10.1038/nrn2882. [DOI] [PubMed] [Google Scholar]
  • 12.O'Reilly JX, Mesulam MM, Nobre AC. The cerebellum predicts the timing of perceptual events. J Neurosci. 2008;28:2252–2260. doi: 10.1523/JNEUROSCI.2742-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ramnani N. The primate cortico-cerebellar system: Anatomy and function. Nat Rev Neurosci. 2006;7:511–522. doi: 10.1038/nrn1953. [DOI] [PubMed] [Google Scholar]
  • 14.Wolpert DM, Miall RC, Kawato M. Internal models in the cerebellum. Trends Cogn Sci. 1998;2:338–347. doi: 10.1016/s1364-6613(98)01221-2. [DOI] [PubMed] [Google Scholar]
  • 15.Friston K. The free-energy principle: A unified brain theory? Nat Rev Neurosci. 2010;11:127–138. doi: 10.1038/nrn2787. [DOI] [PubMed] [Google Scholar]
  • 16.Bengtsson SL, Ullén F. Dissociation between melodic and rhythmic processing during piano performance from musical scores. Neuroimage. 2006;30:272–284. doi: 10.1016/j.neuroimage.2005.09.019. [DOI] [PubMed] [Google Scholar]
  • 17.Koch G, et al. Repetitive TMS of cerebellum interferes with millisecond time processing. Exp Brain Res. 2007;179:291–299. doi: 10.1007/s00221-006-0791-1. [DOI] [PubMed] [Google Scholar]
  • 18.Lewis PA, Miall RC. Brain activation patterns during measurement of sub- and supra-second intervals. Neuropsychologia. 2003;41:1583–1592. doi: 10.1016/s0028-3932(03)00118-0. [DOI] [PubMed] [Google Scholar]
  • 19.Patel AD, Iversen JR. The linguistic benefits of musical abilities. Trends Cogn Sci. 2007;11:369–372. doi: 10.1016/j.tics.2007.08.003. [DOI] [PubMed] [Google Scholar]
  • 20.Werner S, Noppeney U. Distinct functional contributions of primary sensory and association areas to audiovisual integration in object categorization. J Neurosci. 2010;30:2662–2675. doi: 10.1523/JNEUROSCI.5091-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Beauchamp MS, Argall BD, Bodurka J, Duyn JH, Martin A. Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nat Neurosci. 2004;7:1190–1192. doi: 10.1038/nn1333. [DOI] [PubMed] [Google Scholar]
  • 22.Chen JL, Penhune VB, Zatorre RJ. Listening to musical rhythms recruits motor regions of the brain. Cereb Cortex. 2008;18:2844–2854. doi: 10.1093/cercor/bhn042. [DOI] [PubMed] [Google Scholar]
  • 23.Lahav A, Saltzman E, Schlaug G. Action representation of sound: Audiomotor recognition network while listening to newly acquired actions. J Neurosci. 2007;27:308–314. doi: 10.1523/JNEUROSCI.4822-06.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bengtsson SL, et al. Listening to rhythms activates motor and premotor cortices. Cortex. 2009;45:62–71. doi: 10.1016/j.cortex.2008.07.002. [DOI] [PubMed] [Google Scholar]
  • 25.Grahn JA, Rowe JB. Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. J Neurosci. 2009;29:7540–7548. doi: 10.1523/JNEUROSCI.2018-08.2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chen JL, Penhune VB, Zatorre RJ. Moving on time: Brain network for auditory-motor synchronization is modulated by rhythm complexity and musical training. J Cogn Neurosci. 2008;20:226–239. doi: 10.1162/jocn.2008.20018. [DOI] [PubMed] [Google Scholar]
  • 27.Chen JL, Zatorre RJ, Penhune VB. Interactions between auditory and dorsal premotor cortex during synchronization to musical rhythms. Neuroimage. 2006;32:1771–1781. doi: 10.1016/j.neuroimage.2006.04.207. [DOI] [PubMed] [Google Scholar]
  • 28.Baumann S, et al. A network for audio-motor coordination in skilled pianists and non-musicians. Brain Res. 2007;1161:65–78. doi: 10.1016/j.brainres.2007.05.045. [DOI] [PubMed] [Google Scholar]
  • 29.Haslinger B, et al. Transmodal sensorimotor networks during action observation in professional pianists. J Cogn Neurosci. 2005;17:282–293. doi: 10.1162/0898929053124893. [DOI] [PubMed] [Google Scholar]
  • 30.Ivry RB, Spencer RM. The neural representation of time. Curr Opin Neurobiol. 2004;14:225–232. doi: 10.1016/j.conb.2004.03.013. [DOI] [PubMed] [Google Scholar]
  • 31.Penhune VB, Zattore RJ, Evans AC. Cerebellar contributions to motor timing: A PET study of auditory and visual rhythm reproduction. J Cogn Neurosci. 1998;10:752–765. doi: 10.1162/089892998563149. [DOI] [PubMed] [Google Scholar]
  • 32.Del Olmo MF, Cheeran B, Koch G, Rothwell JC. Role of the cerebellum in externally paced rhythmic finger movements. J Neurophysiol. 2007;98:145–152. doi: 10.1152/jn.01088.2006. [DOI] [PubMed] [Google Scholar]
  • 33.Moberget T, et al. Detecting violations of sensory expectancies following cerebellar degeneration: A mismatch negativity study. Neuropsychologia. 2008;46:2569–2579. doi: 10.1016/j.neuropsychologia.2008.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Rao SM, et al. Distributed neural systems underlying the timing of movements. J Neurosci. 1997;17:5528–5535. doi: 10.1523/JNEUROSCI.17-14-05528.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Spencer RM, Verstynen T, Brett M, Ivry R. Cerebellar activation during discrete and not continuous timed movements: An fMRI study. Neuroimage. 2007;36:378–387. doi: 10.1016/j.neuroimage.2007.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Spencer RM, Zelaznik HN, Diedrichsen J, Ivry RB. Disrupted timing of discontinuous but not continuous movements by cerebellar lesions. Science. 2003;300:1437–1439. doi: 10.1126/science.1083661. [DOI] [PubMed] [Google Scholar]
  • 37.Mauk MD, Buonomano DV. The neural basis of temporal processing. Annu Rev Neurosci. 2004;27:307–340. doi: 10.1146/annurev.neuro.27.070203.144247. [DOI] [PubMed] [Google Scholar]
  • 38.Bangert M, et al. Shared networks for auditory and motor processing in professional pianists: Evidence from fMRI conjunction. Neuroimage. 2006;30:917–926. doi: 10.1016/j.neuroimage.2005.10.044. [DOI] [PubMed] [Google Scholar]
  • 39.Lappe C, Herholz SC, Trainor LJ, Pantev C. Cortical plasticity induced by short-term unimodal and multimodal musical training. J Neurosci. 2008;28:9632–9639. doi: 10.1523/JNEUROSCI.2254-08.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Zychaluk K, Foster DH. Model-free estimation of the psychometric function. Atten Percept Psychophys. 2009;71:1414–1425. doi: 10.3758/APP.71.6.1414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ashburner J, Friston KJ. Unified segmentation. Neuroimage. 2005;26:839–851. doi: 10.1016/j.neuroimage.2005.02.018. [DOI] [PubMed] [Google Scholar]
  • 42.Friston KJ, et al. Statistical parametric maps in functional imaging: A general linear approach. Hum Brain Mapp. 1994;2:189–210. [Google Scholar]
  • 43.Friston KJ, Harrison L, Penny W. Dynamic causal modelling. Neuroimage. 2003;19:1273–1302. doi: 10.1016/s1053-8119(03)00202-7. [DOI] [PubMed] [Google Scholar]
  • 44.Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ. Bayesian model selection for group studies. Neuroimage. 2009;46:1004–1017. doi: 10.1016/j.neuroimage.2009.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Penny WD, et al. Comparing families of dynamic causal models. PLOS Comput Biol. 2010;6:e1000709. doi: 10.1371/journal.pcbi.1000709. [DOI] [PMC free article] [PubMed] [Google Scholar]
Proc Natl Acad Sci U S A. 2011 Dec 20;108(51):20295–20296.

Author Summary

Author Summary

Practicing a musical instrument is a rich, multisensory experience involving the integration of visual, auditory, and tactile inputs with motor responses (1). Although it is well-established that music training induces extensive plasticity in auditory and motor systems in the brain, relatively little is known about its possible effects on the temporal binding of signals from multiple sensory sources (i.e., auditory and visual signals) at a neural level. Because musical performance requires precise timing, musical expertise may particularly fine tune the temporal window in which sensory signals need to co-occur to be considered synchronous and bound into a single coherent percept. This study aimed to investigate how long-term piano practicing shapes the neural processes underlying the temporal binding of auditory and visual signals. Our findings highlight the importance of sensory-motor experience (such as piano practicing) in fine tuning how we temporally bind auditory and visual inputs into a unified percept.

We presented subjects with synchronous and asynchronous speech and music as two highly complex classes of stimuli that are linked to different motor effectors (mouth vs. hand). Comparing the effect of musical expertise on the synchrony perception of speech and music allowed us to dissociate (i) generic and (ii) context-specific mechanisms by which piano practicing can influence audiovisual integration. In support of generic mechanisms, music training has been shown to induce changes in the auditory processing system, leading to listening benefits that generalize from music to speech processing (2). Hence, if music training induces a general sensitization to audiovisual temporal (mis)alignment, we would expect a narrower temporal binding window and increased neural audiovisual (a)synchrony effects in musicians for both music and speech. Context-specific mechanisms may rely on internal forward models that are adapted to specific motor tasks and thought to be represented in a neural system involving the premotor cortex and cerebellum (3). Specifically, piano practicing may fine tune a forward model that maps from the motor plan of piano playing onto visible finger movements and concurrent auditory sounds. Thereby, more precise predictions can be made about the relative timings of auditory and visual signals, leading to a narrower temporal binding window in musicians for music but not for speech. Thus, asynchronous music stimuli that violate these predictions should increase neural activation, indicating a prediction error.

Eighteen amateur pianists and nineteen nonmusicians participated in the study. During the psychophysical part of the study, the subjects explicitly judged the audiovisual synchrony of recorded speech sentences and piano melodies at various stimulus onset asynchronies. We estimated each subject's temporal binding window from the proportion of synchronous responses at different asynchrony levels. As shown in Fig. P1A, the temporal binding window was narrower for musicians than nonmusicians selectively for music but not speech. These results provide behavioral evidence that piano practicing fine tunes audiovisual synchrony perception in a context-specific manner.

Fig. P1.

Fig. P1.

(A) The proportion of synchronous responses (across subjects’ mean ± SEM) at different levels of asynchrony for speech (Left) and music (Right) in musicians (black, M+) and nonmusicians (gray, M−; from the psychophysical experiment before the fMRI study). (B, i) Asynchrony effects for music that are enhanced for musicians relative to nonmusicians in the left cerebellum, left premotor cortex, and right posterior superior temporal sulcus. (B, ii) Scatter plot depicting the regression of left premotor neural asynchrony effects for music on perceptual asynchrony sensitivity in musicians (black, M+) and non-musicians (gray, M−).

Using a standard technique called functional MRI (fMRI), we next investigated how musical expertise shapes the neural mechanisms underlying audiovisual temporal binding and (a)synchrony perception. The same speech and music stimuli were presented synchronously and asynchronously with a temporal offset that subjects reported as asynchronous in 67% of the trials. The subjects perceived the stimuli without being engaged in any explicit task, enabling us to evaluate automatic (a)synchrony effects in motor and premotor regions of the brain without potential interference from motor responses or task-related processes. First, we identified the neural system that evaluates the temporal (mis)alignment of auditory and visual signals by comparing asynchronous and synchronous conditions. Asynchronous music and speech signals commonly increased neural activation in a widespread system encompassing the following brain regions: bilateral superior temporal sulci, occipital and fusiform gyri, and premotor and cerebellar cortices. Thus, audiovisual asynchrony of music and speech is detected not only in the sensory processing areas and classical audiovisual integration areas such as the superior temporal sulcus (STS) but also in a premotor-cerebellar circuitry.

Second, we investigated how musical expertise affects neural responses to asynchronous speech and music. Mirroring the context-specific effects seen at the behavioral level, piano practicing modulated neural asynchrony effects selectively for music but not for speech. In line with humans’ generic speech expertise, both musicians and nonmusicians showed similar responses for asynchronous relative to synchronous speech, whereas asynchrony responses to music were amplified in musicians, indicating that the neural asynchrony effects depended on a prior sensory-motor experience (Fig. P1B, i).

The functional relevance of these asynchrony effects was corroborated by additional regression analyses that used a subject's perceptual asynchrony sensitivity to predict their asynchrony-induced activation enhancement (Fig. P1B, ii). These analyses revealed that the better that the musicians are at distinguishing synchronous and asynchronous music stimuli, the greater that their neural asynchrony effects are as seen in the left cerebellum and premotor cortex. Collectively, our results suggest that prior sensory-motor experience induces activations in a STS–premotor-cerebellar circuitry as a supplementary mechanism to determine the temporal (mis)alignment of auditory and visual signals. Indeed, previous studies have implicated the cerebellum and premotor cortex in motor and perceptual timing, particularly in the millisecond range (4).

In conclusion, our behavioral and fMRI results collectively provide compelling evidence that piano practicing enables more precise predictions regarding the relative timings of auditory and visual signals through the refinement of a context-specific forward model. Asynchronous speech and music stimuli that violate the predictions made by the model elicit an error signal in the STS–premotor-cerebellar circuitry that is fine tuned through sensory-motor experience. Hence, music training confers a greater sensitivity to synchronicity of auditory and visual signals specific to music stimuli. Collectively, our findings highlight intimate links between sensory-motor experience and audiovisual synchrony perception, where our interactions with the environment determine whether and how we integrate auditory and visual inputs into a unified percept. They show that action production and audiovisual synchrony perception are closely related. In sum, our sensory-motor experience influences how we bind sensory signals to form a coherent percept of our natural environment.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

See full research article on page E1441 of www.pnas.org.

Cite this Author Summary as: PNAS 10.1073/pnas.1115267108.

References

  • 1.Zatorre RJ, Chen JL, Penhune VB. When the brain plays music: Auditory-motor interactions in music perception and production. Nat Rev Neurosci. 2007;8:547–558. doi: 10.1038/nrn2152. [DOI] [PubMed] [Google Scholar]
  • 2.Musacchia G, Sams M, Skoe E, Kraus N. Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proc Natl Acad Sci USA. 2007;104:15894–15898. doi: 10.1073/pnas.0701498104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wolpert DM, Miall RC, Kawato M. Internal models in the cerebellum. Trends Cogn Sci. 1998;2:338–347. doi: 10.1016/s1364-6613(98)01221-2. [DOI] [PubMed] [Google Scholar]
  • 4.O'Reilly JX, Mesulam MM, Nobre AC. The cerebellum predicts the timing of perceptual events. J Neurosci. 2008;28:2252–2260. doi: 10.1523/JNEUROSCI.2742-07.2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES