Abstract
Human communication relies on the ability to process linguistic structure and to map words and utterances onto our environment. Furthermore, as what we communicate is often not directly encoded in our language (e.g. in the case of irony, jokes or indirect requests), we need to extract additional cues to infer the beliefs and desires of our conversational partners. Although the functional interplay between language and the ability to mentalise has been discussed in theoretical accounts in the past, the neurobiological underpinnings of these dynamics are currently not well understood. Here, we address this issue using functional imaging (fMRI). Participants listened to question-reply dialogues. In these dialogues, a reply is interpreted as a direct reply, an indirect reply or a request for action, depending on the question. We show that inferring meaning from indirect replies engages parts of the mentalising network (mPFC) while requests for action also activate the cortical motor system (IPL). Subsequent connectivity analysis using Dynamic Causal Modelling (DCM) revealed that this pattern of activation is best explained by an increase in effective connectivity from the mentalising network (mPFC) to the action system (IPL). These results are an important step towards a more integrative understanding of the neurobiological basis of indirect speech processing.
Keywords: neuropragmatics, theory of mind, mentalising, language comprehension, semantics, embodied cognition, dynamic causal modelling
Introduction
Human communication involves understanding language on multiple levels: on one level listeners must process the linguistic information contained in an utterance, that is, parse grammatical structure and map word forms onto referents in the real world. On another level much of what we communicate to each other in a conversation is not actually encoded verbally (e.g. irony often involves saying exactly the opposite of what one means), and listeners are therefore tasked with deciphering what speakers mean beyond the cues afforded by the linguistic components of an utterance. Many theoretical accounts of how language meaning is translated into speaker meaning exist (Grice, 1975; Levinson, 2000; Wilson & Sperber, 2004), however the neural underpinnings of pragmatic inferencing remain unclear.
Current models of language comprehension suggest that naturalistic language use relies on networks that extend beyond classical perisylvian language areas (Catani & Bambini, 2014; Fedorenko & Thompson-Schill, 2014). Brain areas involved in perception and action, executive control, memory and mentalising have all been shown to be active during language comprehension tasks (Rüschemeyer et al., 2007; Ferstl et al., 2008; Rueschemeyer et al., 2010b; Van Ackeren et al., 2012; Fedorenko & Thompson-Schill, 2014; van Ackeren & Rueschemeyer, 2014; Van Ackeren et al., 2014; Nijhof & Willems, 2015). Although there is abundant evidence for the involvement of these high-level cognitive networks in deciphering speaker meaning, little is known about the dynamic interactions between these networks. In the current study, we investigate the neural correlates of processing direct and indirect speech in order to uncover how language, perception/action and mentalising areas interact during on-line speech comprehension. In particular, we are interested in how beliefs about others’ intentions influence the activation of language-based semantic meaning in the brain. Semantic meaning is pinpointed in the current study by manipulating whether or not an utterance describes action content: focusing on this type of lexical-semantic information allows us to generate specific hypotheses about what neural correlates we expect to see when specific semantic content is processed.
Previous studies have shown that language comprehension recruits distributed perceptual-motor networks that are involved in the retrieval of lexical-semantic knowledge (Goldberg et al., 2006; Rüschemeyer et al., 2007; Simmons et al., 2007; Barsalou, 2008; Pulvermüller & Fadiga, 2010; Rueschemeyer et al., 2010b; Glenberg & Gallese, 2012; Van Dam et al., 2012; van Ackeren & Rueschemeyer, 2014; Van Ackeren et al., 2014). For example, the comprehension of words that denote actions, such as ‘grasp’ or ‘hit’ have been shown to activate fronto-parietal areas associated with planning and executing hand actions (Hauk et al., 2004; Postle et al., 2008; Rueschemeyer et al., 2010a; Van Dam et al., 2010). Embodied theories of language suggest that modality-specific responses result from covert simulation of past perceptual experiences with words’ referents (Zwaan, 2003; Barsalou, 2008). While the exact contribution of sensorimotor areas to lexical-semantic processing is still debated (Mahon & Caramazza, 2008; Toni et al., 2008), many studies have demonstrated that activation in modality-specific regions is at least a marker for the retrieval of modality-specific semantic content.
Recent accounts have criticised that the scope of the embodied framework is currently limited to the understanding of coded meaning, that is, semantic content directly represented by the words in an utterance (Basnáková et al., 2013; Hagoort, 2013). This type of information is contrasted with speaker meaning, which reflects the speech act, or message the speaker is trying to communicate (Grice, 1975; Holtgraves, 1999). However, naturalistic language use is replete with instances in which linguistic and speaker meaning diverge. Common examples are idiomatic expressions (e.g. ‘he kicked the bucket’) (Boulenger et al., 2009; Raposo et al., 2009), indirect replies (Basnáková et al., 2013; Jang et al., 2013), and indirect requests (Van Ackeren et al., 2012). While the evidence for neural motor activation during idiomatic expressions comprising action words is still debated, indirect requests for action have been shown to reliably activate the neural motor system in multiple studies using different imaging modalities (Van Ackeren et al., 2012; Egorova et al., 2013, 2014). These studies have demonstrated compellingly that activating the neural motor system is not dependent on the word form alone, but rather involves an additional inferential step in which the communicative message of the utterance is extracted.
It is currently not well understood what additional computations are required to mediate between language and distributed semantic systems. As some critics of embodied theories of language argue, one possibility is that the motor system is activated as a result of spreading activation from perisylvian language regions that have been shown to be involved in sentence level language processing (Mahon & Caramazza, 2008). In contrast, more recent accounts have demonstrated that regions sensitive to linguistic/semantic difficulty in language can be partially dissociated from regions involved in generating a communicative intent (Willems et al., 2010; Hagoort, 2013). Specifically, the medial prefrontal cortex (mPFC) and temporal parietal junction (TPJ), components of the mentalising network, have been shown to respond whenever a person thinks about motivations and beliefs of another (Gallagher & Frith, 2003; Saxe & Kanwisher, 2003; Saxe & Wexler, 2005; Saxe, 2006). In line with these findings, studies on indirect requests report activation in mentalising regions alongside the neural motor system (Van Ackeren et al., 2012; Egorova et al., 2014). It is currently not known whether the activation in the neural motor system during indirect requests, that is, the marker for semantic retrieval of motor knowledge, is driven by perisylvian language regions involved in processing complex language input, or the mentalising network involved in inferring the communicative intent of the speaker.
One way to address this question in functional imaging data is through Dynamic Causal Modelling (DCM) (Friston, 2003; Penny et al., 2004; Daunizeau et al., 2011) In DCM, the interactions between regions are modelled at the neuronal level using a bilinear state equation. A carefully motivated model space is defined using three different parameters. These parameters are (i) direct inputs to a given region, (ii) intrinsic connections between regions and (iii) modulations of these connections by experimental perturbations. Subsequently, Bayesian Model Selection (BMS) is used to evaluate which model optimally predicts the observed data. DCM has been successfully applied to model the causal architecture of low-level processes such as visual processing (Pinotsis et al., 2013), as well as high-level processes during theory of mind tasks (Hillebrandt et al., 2013). For the purpose of the present discussion, DCM is a particularly useful tool, as it provides a way to estimate which causal architecture best explains the pattern of activation in language, mentalising and motor networks during indirect speech processing.
The first aim of the current study was to test how mentalising and language networks interact during indirect speech processing. Specifically understanding indirect speech could rely on projections from mentalising to language networks, from language to mentalising networks, or indeed a mutual exchange between the two. Addressing this question is highly relevant for understanding the role of the mentalising network for indirect speech processing. The second aim of the study was to clarify how indirect requests engage the neural motor system (Van Ackeren et al., 2012). Specifically, we tested whether the motor system is recruited directly via the mentalising network, or rather through its putative connections with the language network.
We used indirect speech as a model in which all three systems have been shown to be involved (Van Ackeren et al., 2012). Participants in the scanner listened to short dialogues in which depending on the question, the same reply could be interpreted as a simple statement, an indirect reply, or an indirect request. While all three conditions engage the language network, indirect speech will also activate the mentalising network, and indirect requests for action the neural motor system. From each of these networks, the time course from one representative region was used to specify candidate models using DCM. The seed region for the action network was functionally defined, while the seed regions for the mentalising (mPFC) and language networks (IFG) were defined as the functional peaks in regions that had been shown to be uniquely associated with one or the other system (Willems et al., 2010). Finally, we used BMS to estimate the most likely model given the data.
Materials and methods
Participants
A total of 25 healthy female participants between 18 and 35 years took part in the current study for course credits or monetary compensation. Due to excessive movement or failure to respond to the catch trials, three participants were excluded from the dataset prior to the analysis. All participants were right handed and reported that British English was their first language. None of the participants reported a known neurological disorder, or uncorrected auditory or visual impairment. The study was in accordance with the declaration of Helsinki and approved by the local ethics committee of the York Neuroimaging Center (YNIC).
Stimuli
The stimuli consisted of 144 spoken dialogues between two individuals (Speaker A, Speaker B). 108 of the stimuli comprised intelligible dialogues between A and B, while 36 stimuli were non-intelligible (i.e. backwards speech). In the intelligible trials A always asked a question, which was answered by B. The question-answer pairings resulted in three experimental conditions: (i) Direct Reply trials, in which B’s response is a literal and factual response to A’s question; (ii) Indirect Reply trials, in which B’s response is a non-literal reply to A’s question that requires some inference about B’s meaning to be drawn; (iii) Indirect Request for Action trials in which B’s response is a non-literal reply to A’s question that furthermore suggests that B requires A to perform an action. Examples are provided in Table 1. The complete set of stimuli will be made available upon request.
Table 1.
Example dialogues from the stimulus set
Speaker A | Speaker B | |
---|---|---|
Direct Reply | How far away is China? | It is quite far away. |
Coded meaning | ||
Indirect Reply | Have you started preparing for the exam? | It is quite far away |
Speaker meaning | ||
Indirect Request | Shall I move the TV closer to the sofa? | It is quite far away. |
Speaker meaning + action |
A given reply (‘It is quite far away’) is interpreted as a direct reply, indirect reply or indirect request, depending on the question.
There were 36 trials for each of the four conditions (intelligible and non-intelligible stimuli).
Stimuli were selected based on two iterations of piloting in which 23 participants in total were asked to indicate whether a given reply was direct or indirect, and whether the goal of the reply was to elicit an action. Items were categorised as indirect replies if at least 75% of respondents thought it was indirect rather than direct. In addition, items were categorised as indirect request, if at least 75% of respondents indicated that the reply was a request for an action. The dialogues were recorded using a male and a female speaker, and the speaker was counterbalanced across trials. All auditory stimuli were amplitude normalised using Praat software (www.praat.org).
Stimulus presentation
Auditory stimuli were adjusted to 10 dB and presented via headphones to participants in the scanner. In addition, participants used earplugs. This procedure ensured that the speech was clearly intelligible, while still within the regulation for noise exposure in the UK.
Each trial began with a jittered interval of fixation (4000–6000 ms), followed by the question (∼1530 ms), a rest (4000 ms) and the reply (∼1220 ms). To indicate who was speaking, each utterance was accompanied by a visually presented letter (A or B). The voice of the speaker was counterbalanced. Participants were instructed to listen to the conversation carefully and think about whether B’s response implied a request for A to act. To ensure that participants were engaged in the task, catch trials were introduced on 10% of trials. On catch trials, participants were asked to indicate whether B wants A to perform an action. Participants responded using their right hand via a non-magnetic button box inside the scanner.
fMRI data acquisition and pre-processing
MRI data acquisition was performed at the York Neuroimaging Centre on a GE HDx Excite MRI scanner with a magnetic field strength of 3 T. Functional volumes were collected with 34 axial slices using a gradient EPI sequence (TR = 2 s, TE = 19 ms, flip angle 90°, FOV 19.2 × 19.2 cm, voxel dimensions 3 × 3 × 3 mm, matrix size: 64 × 64). The data acquisition was performed in two separate runs each containing 575 volumes, and lasting approximately 19 min. Following the functional data acquisition, a T1-weighted structural scan was acquired with 192 sagittal slices (TR = 3 s, TE = 7.8 ms, flip angle: 20°, voxel dimensions 1.13 × 1.13 × 1 mm, matrix size: 256 × 256 × 176, FOV 290 × 290 × 176 mm).
All analyses were performed using SPM 8 (Statistical Parametric Mapping, www.fil.io.ucl.uk/spm) on Matlab 2012a (Mathworks, Natick, MA). The data were read in excluding the first 5 volumes to avoid T1 equilibration effects. Functional images were movement corrected, slice time corrected and normalised to a standard EPI template. Subsequently, the normalised functional image was used to co-register the structural T1-weighted image. Finally, the functional images were convolved with a smoothing kernel of 8-mm FWHM, and high pass filtering (cut-off period: 128 s) was applied to correct for slow drifts in the data.
GLM analysis
General linear modelling (GLM) was used to identify regions that are sensitive to (i) intelligible speech, (ii) indirect speech and (iii) requests for action. The data were analysed using an event-related design (epoch = 2 s) centred on the reply, which was the same across all conditions, but interpreted differently depending on the question. Participants’ movement, and responses, as well as time and dispersion parameters were modelled as effects of no interest. We also modelled speaker A’s question, which was presented 4 s before the period of interest (i.e. the reply), as an effect of no interest. To identify regions sensitive to language we contrasted the three intelligible conditions (direct reply, indirect reply and indirect request) against the backwards speech condition. To identify mentalising regions involved in processing indirect speech, all indirect conditions (indirect reply, indirect request) were contrasted with the direct reply condition. Finally, to identify parts of the neural motor system that are sensitive to the retrieval of action-related semantic knowledge, we contrasted the indirect request versus indirect reply condition. Second-level analysis was performed at the group level. To correct for multiple comparisons a cluster extent threshold was applied. The cluster threshold was determined by computing 1000 simulations of whole-brain fMRI activity maps using a 8 mm FWHM smoothing kernel, and a voxel size of 3 × 3 × 3 mm. Assuming an individual voxel type I error rate at α = 0.005, we estimated that the probability of finding a continuous cluster of 15 or more voxels (405 mm3) is ≤0.05. This threshold was applied to all statistical maps. The procedure is explained in Slotnick et al. (2003) and the Matlab code can be obtained from the author's website (https://www2.bc.edu/sd-slotnick/scripts.htm).
Connectivity analysis using DCM
We used DCM to evaluate how language, mentalising and neural motor systems communicate during indirect speech processing. As DCM is a highly theory-driven method that performs best with a small number of regions, we decided to use one representative region from each of the three networks of interest. These regions were the IFG (language network), the mPFC (mentalising network) and the IPL (action network). The choice of the IFG and mPFC were guided by our GLM analysis as well as previous research showing that each region is uniquely sensitive to semantic or mentalising aspects of a task (Willems et al., 2010). In addition, the IFG is considered a linguistic unification zone and has been repeatedly found to be sensitive to semantic integration at the sentence level (Grewe et al., 2005; Hagoort, 2005; Rogalsky & Hickok, 2011; Hagoort, 2013). Finally, the choice of the IPL as part of the neural motor system was informed by the GLM analysis, as well as previous work from our group (Van Ackeren et al., 2012).
Time-series were extracted for each individual participant from voxels in a 6 mm sphere. The center of this sphere was determined based on the individual subject peak within a radius of 15 mm around the group peak in the GLM analysis of the respective contrast of interest. Time series could be extracted reliably (P < 0.05, uncorrected) from all regions of interest in 19 out of 22 participants. These time series were used to construct the relevant models in our model space. Modelling was performed using deterministic, bilinear, one-state models with mean-centred inputs. Common to all models in our models space, intrinsic connections were assumed to connect regions of interest in both directions. Furthermore, all intelligible speech conditions were assumed to enter the IFG as a driving input. The rationale for choosing the IFG as the input region was that all subsequent processing relies on an initial stage of linguistic parsing and integration. While the IFG is certainly not an early language region, we consider it a bottleneck where phonological, syntactic and semantic information converge, an assumption that is well grounded in the literature (Hagoort, 2005, 2013). More formally, the minimal requirement for an input region in DCM is that the region responds to all conditions in the model. Here, we used a conjunction analysis overlaying each of the three speech conditions (direct, indirect reply, indirect request) versus rest. This supplementary analysis showed overlapping activation between all three conditions in bilateral auditory cortex up to the level of IFG. Therefore, we conclude that the choice for the IFG as the driving input to our models is valid both from a theoretical and methodological point of view.
The aim of the DCM was to test how language and mentalising networks interact during indirect speech processing and which of them modulates the neural motor system when retrieval of action knowledge is required (i.e. during indirect requests). To address these questions, we constructed nine models, which varied on these dimensions. Specifically, information flow between IFG and mPFC during indirect speech processing could be bidirectional or in one direction only. Additionally, IPL activation during indirect requests could be driven by the IFG, mPFC or both (Figure 1A). To decide which model best explains the data, we used random-effects Bayesian Model Selection (BMS).
Fig. 1.
Results of the DCM analysis testing the influence of language and mentalising networks on the neural motor system during indirect requests. (A) Illustrated are the nine models that were considered in the BMS procedure. Arrows indicate direct inputs, and modulatory connections of all intelligible speech conditions (S), all indirect conditions (I) and the action request condition (A). Colour codes are used to highlight areas that are predominantly part of the language (blue), mentalising (green) and action system (red). The shaded columns depict the partitioning of our model space into the three different families. (B) The left panel shows the expected probability for the three model families. The right panel depicts the exceedance probability of the three model families. (C) Depicted are the parameter estimates of the direct inputs and modulatory connections in the winning model family (Family 1) that are significantly different from zero. Both indirect speech conditions (reply, request) enhance effective connectivity from IFG to mPFC, and indirect requests facilitate connectivity from mPFC to IPL. (D) Intrinsic connectivity estimates from the weighted average of the models in the winning model family (Family 1). Significant intrinsic connectivity is observed from IPL to mPFC and IFG and from IFG to IPL and mPFC. No significant intrinsic connectivity emerges from mPFC.
Results
Behavioural results
Wilcoxon signed rank tests were used to test whether the median proportion of interpreted actions across subjects was different for indirect requests versus the direct and indirect replies. As predicted, indirect requests were more likely to be interpreted as requiring an action (median = 0.75) as both the direct (median = 0) and indirect replies (median = 0) (request versus direct: z = −3.75, P < 0.001; request versus indirect: z = −3.75, P < 0.001). There was no significant difference between the direct and indirect replies (direct versus indirect: z = −0.629, P > 0.5).
Whole-brain analysis
Whole-brain analysis of the contrast between all intelligible speech conditions (direct reply, indirect, reply and indirect requests) vs reversed speech revealed a mostly left lateralised language network including large portions of the temporal lobe and inferior frontal gyrus. Furthermore, the contrast between indirect vs direct speech revealed a cluster in the left IFG, as well as mPFC and SMA. Additional clusters were observed in bilateral Caudate nucleus. Finally, the contrast between indirect requests versus indirect replies revealed activation in the neural motor system that is often observed when participants process action-related language content. These areas include the left pre-central gyrus, and IPL. The peak activation of the indirect and action contrasts is represented in Table 2. Clusters of activation are illustrated in the statistical activation maps in Figure 2.
Table 2.
Peak activation of the significant clusters in the indirect, and action contrast
Region | Cluster level extent (Voxels) | Peak voxel level |
Coordinates |
|||
---|---|---|---|---|---|---|
t | equivZ | x | y | z | ||
Indirect speech > Direct speech | ||||||
Left inferior frontal gyrus | 358 | 4.76 | 3.88 | −57 | 14 | 19 |
Left medial prefrontal cortex | 113 | 3.66 | 3.18 | −9 | 41 | 37 |
Left SMA | 116 | 3.79 | 3.27 | −6 | 5 | 64 |
Left caudate | 216 | 3.52 | 3.09 | −9 | 8 | 4 |
Right caudate | 3.5 | 3.07 | 12 | 8 | 7 | |
Indirect request > Indirect reply | ||||||
Left pre-central gyrus | 201 | 5.04 | 4.04 | −36 | −4 | 43 |
Left inferior parietal lobule | 341 | 5.36 | 4.21 | −54 | −43 | 49 |
Right inferior parietal lobule | 131 | 4.84 | 3.92 | 57 | −52 | 49 |
Depicted are the largest clusters of activation (k > 100).
Fig. 2.
Results from the whole brain GLM analysis projected on an MNI inflated surface. Illustrated are the statistical maps (P < 0.05, cluster-corrected) of the intelligible speech contrast (top-panel, blue), the indirect speech contrast (middle panel, green) and the action contrast (bottom panel, red). Emphasised are the three locations (IFG, mPFC, IPL), which were subsequently entered into the DCM analysis.
Bayesian model selection
The random-effects BMS procedure comparing the nine different models revealed that the winning model was model 1, where modulatory connections from mPFC drive both IFG and IPL activation. However, with an exceedance probability, that is the probability that model 1 outperforms all other models, of merely 0.69 the evidence in favour of the winning model cannot be considered conclusive.
As the main goal of the current study was to investigate whether activation in the neural motor system during indirect request processing is driven by the language versus mentalising system, we followed up our preliminary analysis with a family-level random-effects BMS procedure. In family level BMS, the model space is partitioned into model families of equal size, and a weighted average is computed for each partition on the basis of the individual model posterior probability (Penny et al., 2010; Stephan et al., 2010). The advantage of this approach is that inferences can be drawn about specific model parameters taking into account the uncertainty introduced by all other variable parameters in the model space. Family level inference is particularly useful if no single winning model can be identified in the BMS procedure on the whole model space. Here, we partitioned our model space into three different model families (Figure 1A). Family 1 contained all models where IPL activation is driven only by mPFC. Family 2 included the three models where IPL activation was driven by IFG and Family 3 captured models where both mPFC and IFG drive IPL activation.
The random-effects BMS procedure on the three model families revealed that IPL activation is most likely driven by mPFC (and IFG), with an exceedance probability of 0.94. Family 1 (mPFC driving IPL) alone accounted for an exceedance probability of 0.84 (Figure 1B). Taken together, this is evidence that the activation in IPL during indirect request processing is most likely driven by mPFC, or mPFC and IFG together, but not IFG alone. In other words, the activation of the motor system seems to be modulated via a pathway from the mentalising system, but not the language system alone.
To further corroborate these results, we computed one sample t-tests on the weighted parameter estimates of all modulatory connections and direct inputs in the winning model family (Family 1). Confirming the results of the BMS procedure, our analysis revealed a significant positive modulation from mPFC to IPL during the indirect request condition [t(18) = 2.46, P = 0.02]. In addition, we found significant modulations from IFG to mPFC during both indirect speech conditions [indirect reply: t(18) = 2.31, P < 0.033; indirect request: t(18) = 4.25, P < 0.001]. Finally, all three speech conditions showed significant direct inputs to the IFG [direct: t(18) = 30.29, P < 0.001); reply: t(18) = 36.22, P < 0.001; request: t(18) = 43.80, P < 0.001]. These results as well as the mean connection weights are illustrated in Figure 1C. Notably, all modulatory and direct parameters significantly different from zero had a positive sign.
Finally, we used one-sample t-tests to investigate whether the weighed parameter averages of the intrinsic connections in the winning model family (Family 1) are significant from zero. Here, we found enhanced connectivity from IFG to mPFC [t(18) = 9.50, P < 0.001] and IPL [t(18) = 7.45, P < 0.001], and enhanced connectivity from IPL to mPFC [t(18) = −6.10, P < 0.001) and IFG [t(18) = −3.45, P < 0.003). Interestingly, we find no evidence for intrinsic or latent connectivity in the absence of a task from mPFC to any of the other two regions. This pattern of results is not uncommon in DCM and suggests that there is no evidence for a directed modulation from mPFC to any of the other regions unless in the context of an indirect request.
Discussion
Human communication involves understanding others on both a linguistic and a social level. In the current study, we investigated the neural dynamics involved in processing direct and indirect speech in order to shed light on how high-level cognitive networks (e.g. language, mentalising/ToM and distributed semantics) interact to support social communication. Our results demonstrate that interpreting indirect speech enhances the flow of information from language to mentalising networks. Furthermore, if the speaker makes an indirect request to encourage the listener to perform a physical action, we observe effective connectivity from the mentalising to the neural motor system.
Inferring the speaker’s beliefs induces enhanced communication between language and mentalising networks
Evidence from developmental and comparative studies suggests that the ontogenetic and phylogenetic development of human language is built onto an infrastructure for mind reading, and social interaction (Tomasello, 2008). In contrast, others have argued that the complexity of human mind reading arises from our rich language infrastructure (Carruthers, 2002). Although there is much debate about the relationship between language and our ability to mentalise, few would object to the hypothesis that the two cognitive functions are closely intertwined.
This close coupling between the two systems has also been shown empirically. For example, Newton and De Villiers (2007) demonstrated that during a verbal shadowing task participants’ performance on the false belief task (a well-recognised test of mentalising abilities) is compromised. Yet, it has also been demonstrated that aphasic patients with severe language impairments perform well on the false belief task, suggesting that the two systems can also operate independently (Varley & Siegal, 2000). This finding was further corroborated by functional imaging showing that the neural substrates for language processing and building a communicative intent can be partially dissociated (Willems et al., 2010), and that listeners recruit the different pathways flexibly during language comprehension (Nijhof & Willems, 2015) However, although language and mentalising networks might not share a common neural pathway, there is abundant interaction between the two networks.
Indeed electrophysiological studies have shown that inferences about mental states of others modulate neurophysiological responses to language in a time window overlapping with or even preceding that of semantic access (Egorova et al., 2013; Rueschemeyer et al., 2015). For example, Egorova et al. (2013) demonstrated that EEG responses to an object name are modulated as early as 100–200 ms if the utterance is interpreted as a request for that object (pragmatic inference). Furthermore, Rueschemeyer et al. (2015) have shown that the amplitude of the N400, a classic ERP component linked to semantic integration, is modulated if participants are aware that another person perceives a sentence to contain a semantic violation, even if the participant him/herself judges the sentence to be correct. The latency of this ‘Social N400-Effect’ does not differ from that of the canonical N400-Effect. These studies suggest that processing another person’s beliefs induces neuronal changes that precede or overlap in time with language comprehension.
The current study builds on these results, demonstrating that, the mPFC, a region functionally tuned to mental state inferences (Willems et al., 2010), receives continuous input from both language (IFG) and distributed semantic networks (IPL), the former of which is enhanced if participants infer meaning from indirect speech content. These enhanced projections from IFG to mPFC could thus reflect a neural correlate of the demands to go beyond the coded meaning of the utterance and generate hypotheses about the interlocutor's beliefs and motivation.
It should be noted though that during the GLM analysis we also found a robust IFG cluster contrasting indirect versus direct speech processing. This result is inconsistent with the dissociation described by Willems et al. (2010). One possible explanation for this observation is that processing indirect speech by itself requires additional inferential steps that might not necessarily be of a social nature. That is, up to the point where speaker meaning can be accessed via the mentalising network, interpreting an utterance at the level of coded meaning could be inherently more taxing for the language system itself. In addition, an important difference between the current study and the study by Willems and colleagues is that participants in their study were asked to design a message for another speaker, rather than interpret a given message. As such, the information transfer from one system to the other might be reversed and potential ambiguity might be resolved within the mentalising system, even before it reaches the language output.
Mentalising engages the neural motor system during indirect requests
Over the last decade, a number of studies have provided compelling evidence that perceptual and motor networks in the brain are activated when participants access semantic information through language (Barsalou, 2008; Pulvermüller & Fadiga, 2010; Glenberg & Gallese, 2012). A long-standing discussion in this field pertains to what drives the activation patterns in these networks. For example, action verbs could trigger the motor system automatically through direct connections with language areas, or rather through an indirect inferential step.
The study of indirect language provides an opportunity to directly address this question. For example, an utterance such as ‘It is hot in here’ should elicit the retrieval of action knowledge when it is interpreted as an indirect request to open the window, but not as a statement about the weather. As no direct action words are used in these sentences activation of the motor system cannot be explained by an automatic activation, but has to be the result of a prediction about the intention of the other speaker. Indeed, Van Ackeren et al. (2012) found that these requests activate the neural motor system, as well as a mentalising network, which could reflect a neural substrate for this prediction. While this is evidence that the motor system relies on a secondary inferential step, the study could not disentangle the individual contributions of the motor and mentalising system during indirect speech processing.
Here, we demonstrate that indirect speech engages parts of the mentalising network (mPFC), while only indirect requests additionally activate the neural motor system (IPL, PMC). Corroborating the conclusions from Van Ackeren et al. (2012) this is evidence that the two networks are distinct and the neural motor system in particular is sensitive to the action related content in the utterance. Yet, the most important finding of the current study is that the neural motor system (IPL) seems to be driven primarily by the mentalising system (mPFC), and not the language system alone (IFG).
These results need to be interpreted in the context of the methods used. That is, while we show that mentalising rather than language networks modulate activity in the motor system, we cannot rule out the possibility that other regions contribute to this pattern of activation as well. DCM was employed to answer very specific questions about the relationship between the three networks, and further studies are needed to study the interactions between mentalising and other functional networks. One promising development in this direction is the study of indirect speech with emotional content (Basnáková et al., 2013; Lai et al., 2015).
Conclusion
The current results are in line with a more integrated account of language comprehension in which natural language processing is the result of dynamic interactions between classical language, mentalising and distributed semantic perception/action networks. While each of these networks has been studied extensively, the journey towards understanding how these different systems work together is only just beginning. Here, we present a first step in this direction by demonstrating how the mPFC, a critical component of the mentalising network, modulates activity in classical language areas and the neural motor system when participants process indirect speech. In the future, we are hoping that the use of fast neurophysiological measures such as MEG and EEG will further corroborate and extend these findings.
Conflict of interest. None declared.
References
- Barsalou L.W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–45. [DOI] [PubMed] [Google Scholar]
- Basnáková J., Weber K., Petersson K.M., van Berkum J., Hagoort P. (2013). Beyond the language given: the neural correlates of inferring speaker meaning. Cerebral Cortex, 24, 2572–8. [DOI] [PubMed] [Google Scholar]
- Boulenger V., Hauk O., Pulvermüller F. (2009). Grasping ideas with the motor system: semantic somatotopy in idiom comprehension. Cerebral Cortex, 19, 1905–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Catani M., Bambini V. (2014). A model for social communication and language evolution and development (SCALED). Current Opinion in Neurobiology, 28, 165–71. [DOI] [PubMed] [Google Scholar]
- Carruthers P. (2002). The cognitive functions of language. Behav Brain Sci, 25, 657–726. [DOI] [PubMed] [Google Scholar]
- Daunizeau J., David O., Stephan K.E. (2011). Dynamic causal modelling: a critical review of the biophysical and statistical foundations. Neuroimage, 58, 312–22. [DOI] [PubMed] [Google Scholar]
- Egorova N., Pulvermüller F., Shtyrov Y. (2014). Neural dynamics of speech act comprehension: an MEG study of naming and requesting. Brain Topography, 27, 375–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Egorova N., Shtyrov Y., Pulvermüller F. (2013). Early and parallel processing of pragmatic and semantic information in speech acts: neurophysiological evidence. Frontiers in Human Neuroscience, 7, 86.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fedorenko E., Thompson-Schill S.L. (2014). Reworking the language network. Trends in Cognitive Sciences, 18, 120–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferstl E.C., Neumann J., Bogler C., Cramon D.Y.V. (2008). The extended language network : a meta-analysis of neuroimaging studies on text comprehension. Human Brain Mapping, 593, 581–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friston K. (2003) Dynamic causal modelling. In: Ashburner J., Friston K., Penny W., editors. Human Brain Function. 2nd edn. London: Elsevier, 1063–90. [Google Scholar]
- Gallagher H.L., Frith C.D. (2003). Functional imaging of “theory of mind”. Trends in Cognitive Sciences, 7, 77–83. [DOI] [PubMed] [Google Scholar]
- Glenberg A.M., Gallese V. (2012). Action-based language: a theory of language acquisition, comprehension, and production. Cortex, 48, 905–22. [DOI] [PubMed] [Google Scholar]
- Goldberg R.F., Perfetti C.A., Schneider W. (2006). Perceptual knowledge retrieval activates sensory brain regions. Journal of Neuroscience, 26, 4917–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grewe T., Bornkessel I., Zysset S., Wiese R., Von Cramon D.Y., Schlesewsky M. (2005). The emergence of the unmarked: a new perspective on the language-specific function of Broca’s area. Human Brain Mapping, 26, 178–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grice P. (1975) Logic and conversation In: Cole P., Morgan J.L. editors. Syntax and Semantics. New York: Academic Press, 41–58. [Google Scholar]
- Hagoort P. (2005). On Broca, brain, and binding: a new framework. Trends in Cognitive Science, 9, 416–23. [DOI] [PubMed] [Google Scholar]
- Hagoort P. (2013) MUC (Memory, Unification, Control) and beyond. Frontiers in Psychology, 4, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hauk O., Johnsrude I., Pulvermüller F. (2004). Somatotopic representation of action words in human motor and premotor cortex. Neuron, 41, 301–7. [DOI] [PubMed] [Google Scholar]
- Hillebrandt H., Dumontheil I., Blakemore S.J., Roiser J.P. (2013). Dynamic causal modelling of effective connectivity during perspective taking in a communicative task. Neuroimage, 76, 116–24. [DOI] [PubMed] [Google Scholar]
- Holtgraves T. (1999). Comprehending indirect replies: when and how are their conveyed meanings activated? The Journal of Memory and Language, 41, 519–40. [Google Scholar]
- Jang G., Yoon S.A., Lee S.E., et al. (2013). Everyday conversation requires cognitive inference: neural bases of comprehending implicated meanings in conversations. Neuroimage, 81, 61–72. [DOI] [PubMed] [Google Scholar]
- Lai V.T., Willems R.M., Hagoort P. (2015). Feel between the lines: implied emotion in sentence comprehension. The Journal of Cognitive Neuroscience, 27, 1528–41. [DOI] [PubMed] [Google Scholar]
- Levinson S.C. (2000) Presumptive Meanings: The Theory of Generalized Conversational Implicature. Cambridge, MA: MIT Press. [Google Scholar]
- Mahon B.Z., Caramazza A. (2008). A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content. Journal of Physiology-Paris, 102, 59–70. [DOI] [PubMed] [Google Scholar]
- Newton A.M., De Villiers J.G. (2007). Thinking while talking: adults fail nonverbal false-belief reasoning. Psychology Science, 18, 574–9. [DOI] [PubMed] [Google Scholar]
- Nijhof A.D., Willems R.M. (2015) Simulating Fiction: Individual Differences in Literature Comprehension Revealed with fMRI. PLoS One, 10(2):e0116492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penny W.D., Stephan K.E., Mechelli A., Friston K.J. (2004). Comparing dynamic causal models. Neuroimage, 22, 1157–72. [DOI] [PubMed] [Google Scholar]
- Penny W.D., Stephan K.E., Daunizeau J., Rosa M.J., Friston K.J., Schofield T.M., Leff A.P. (2010). Comparing families of dynamic causal models. PLoS Comput Biol, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinotsis D.A., Schwarzkopf D.S., Litvak V., Rees G., Barnes G., Friston K.J. (2013). Dynamic causal modelling of lateral interactions in the visual cortex. Neuroimage, 66, 563–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Postle N., McMahon K.L., Ashton R., Meredith M., de Zubicaray G.I. (2008). Action word meaning representations in cytoarchitectonically defined primary and premotor cortices. Neuroimage, 43, 634–44. [DOI] [PubMed] [Google Scholar]
- Pulvermüller F., Fadiga L. (2010). Active perception: sensorimotor circuits as a cortical basis for language. Nature Reviews Neuroscience, 11, 351–60. [DOI] [PubMed] [Google Scholar]
- Raposo A., Moss H.E., Stamatakis E.A., Tyler L.K. (2009). Modulation of motor and premotor cortices by actions, action words and action sentences. Neuropsychologia, 47, 388–96. [DOI] [PubMed] [Google Scholar]
- Rogalsky C., Hickok G. (2011). The role of Broca’s area in sentence comprehension. The Journal of Cognitive Neuroscience, 23, 1664–80. [DOI] [PubMed] [Google Scholar]
- Rueschemeyer S.-A., Gardner T., Stoner C. (2015) The Social N400 effect: how the presence of other listeners affects language comprehension. Psychonomic Bulletin Review, 22, 128–34. [DOI] [PubMed] [Google Scholar]
- Rueschemeyer S.-A., Glenberg A.M., Kaschak M.P., Mueller K., Friederici A.D. (2010a). Top-down and bottom-up contributions to understanding sentences describing objects in motion. Frontiers in Psychology, 1, 183.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rueschemeyer S.-A., van Rooij D., Lindemann O., Willems R.M., Bekkering H. (2010b). The function of words: distinct neural correlates for words denoting differently manipulable objects. Journal of Cognitive Neuroscience, 22, 1844–51. [DOI] [PubMed] [Google Scholar]
- Rüschemeyer S.-A., Brass M., Friederici A.D. (2007). Comprehending prehending: neural correlates of processing verbs with motor stems. Journal of Cognitive Neuroscience, 19, 855–65. [DOI] [PubMed] [Google Scholar]
- Saxe R. (2006). Why and how to study Theory of Mind with fMRI. Brain Research, 1079, 57–65. [DOI] [PubMed] [Google Scholar]
- Saxe R., Kanwisher N. (2003). People thinking about thinking people: the role of the temporo-parietal junction in “theory of mind”. Neuroimage, 19, 1835–42. [DOI] [PubMed] [Google Scholar]
- Saxe R., Wexler A. (2005). Making sense of another mind: the role of the right temporo-parietal junction. Neuropsychologia, 43, 1391–9. [DOI] [PubMed] [Google Scholar]
- Simmons W.K., Ramjee V., Beauchamp M.S., McRae K., Martin A., Barsalou L.W. (2007). A common neural substrate for perceiving and knowing about color. Neuropsychologia, 45, 2802–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slotnick S.D., Moo L.R., Segal J.B., Hart J. (2003). Distinct prefrontal cortex activity associated with item memory and source memory for visual shapes. Cognitive Brain Research, 17, 75–82. [DOI] [PubMed] [Google Scholar]
- Stephan K.E., Penny W.D., Moran R.J., den Ouden H.E.M., Daunizeau J., Friston K.J. (2010). Ten simple rules for dynamic causal modeling. Neuroimage, 49, 3099–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Toni I., de Lange F.P., Noordzij M.L., Hagoort P. (2008). Language beyond action. Journal of Physiology, 102, 71–9. [DOI] [PubMed] [Google Scholar]
- Tomasello M. (2008) Origins of Human Communication Cambridge, MA. [Google Scholar]
- Van Ackeren M.J., Casasanto D., Bekkering H., Hagoort P., Rueschemeyer S.-A. (2012). Pragmatics in action: indirect requests engage theory of mind areas and the cortical motor network. Journal of Cognitive Neuroscience, 24, 2237–47. [DOI] [PubMed] [Google Scholar]
- Van Ackeren M.J., Rueschemeyer S.-A. (2014). Cross-modal integration of lexical-semantic features during word processing: Evidence from oscillatory dynamics during EEG. PLoS One, 9(7), e101042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Ackeren M.J., Schneider T.R., Müsch K., Rueschemeyer S.-A. (2014). Oscillatory neuronal activity reflects lexical-semantic feature integration within and across sensory modalities in distributed cortical networks. Journal of Neuroscience, 34, 14318–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Dam W.O., Rueschemeyer S.-A., Bekkering H. (2010). How specifically are action verbs represented in the neural motor system: an fMRI study. Neuroimage, 53, 1318–25. [DOI] [PubMed] [Google Scholar]
- Van Dam W.O., van Dijk M., Bekkering H., Rueschemeyer S.-A. (2012). Flexibility in embodied lexical-semantic representations. Human Brain Mapping, 33, 2322–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varley R., Siegal M. (2000). Evidence for cognition without grammar from causal reasoning and “theory of mind” in an agrammatic aphasic patient. Current Biology, 10, 723–6. [DOI] [PubMed] [Google Scholar]
- Willems R.M., de Boer M., de Ruiter J.P., Noordzij M.L., Hagoort P., Toni I. (2010). A dissociation between linguistic and communicative abilities in the human brain. Psychological Science A, 21, 8–14. [DOI] [PubMed] [Google Scholar]
- Wilson D., Sperber D. (2004) Relevance theory In: Horn L. R., Ward G., editors. The Handbook of Pragmatics. Oxford: Blackwell, 607–32. [Google Scholar]
- Zwaan R.A. (2003). The immersed experiencer: toward an embodied theory of language comprehension. The Psychology of Learning Motivation – Advances in Research Theory, 44, 35–62. [Google Scholar]