Skip to main content
Philosophical Transactions of the Royal Society B: Biological Sciences logoLink to Philosophical Transactions of the Royal Society B: Biological Sciences
. 2019 Mar 11;374(1771):20180033. doi: 10.1098/rstb.2018.0033

Brain activity during reciprocal social interaction investigated using conversational robots as control condition

Birgit Rauchbauer 1,2,4, Bruno Nazarian 1, Morgane Bourhis 1, Magalie Ochs 3, Laurent Prévot 4,5, Thierry Chaminade 1,
PMCID: PMC6452252  PMID: 30852994

Abstract

We present a novel functional magnetic resonance imaging paradigm for second-person neuroscience. The paradigm compares a human social interaction (human–human interaction, HHI) to an interaction with a conversational robot (human–robot interaction, HRI). The social interaction consists of 1 min blocks of live bidirectional discussion between the scanned participant and the human or robot agent. A final sample of 21 participants is included in the corpus comprising physiological (blood oxygen level-dependent, respiration and peripheral blood flow) and behavioural (recorded speech from all interlocutors, eye tracking from the scanned participant, face recording of the human and robot agents) data. Here, we present the first analysis of this corpus, contrasting neural activity between HHI and HRI. We hypothesized that independently of differences in behaviour between interactions with the human and robot agent, neural markers of mentalizing (temporoparietal junction (TPJ) and medial prefrontal cortex) and social motivation (hypothalamus and amygdala) would only be active in HHI. Results confirmed significantly increased response associated with HHI in the TPJ, hypothalamus and amygdala, but not in the medial prefrontal cortex. Future analysis of this corpus will include fine-grained characterization of verbal and non-verbal behaviours recorded during the interaction to investigate their neural correlates.

This article is part of the theme issue ‘From social brains to social robots: applying neurocognitive insights to human–robot interaction'.

Keywords: human interaction, human–robot interaction, second-person neuroscience, functional magnetic resonance imaging

1. Introduction

Humans' social bonds are established and maintained through interactions with others. These interactions ‘are characterized by intricate reciprocal relations with the perception of socially relevant information prompting (re-) actions, which are themselves processed and reacted to' [1, p. 397]. To date, the field of social neuroscience, investigating the neurophysiological basis of social interactions, has mostly focused on the investigation of the observation of social signals, rather than on truly interactive social settings. In an attempt to capture the interactional dynamics in real-life, ‘second person neuroscience' [1,2] encourages the investigation of naturalistic interactive paradigms for enhanced ecological validity [3]. This approach aims to shift social neuroscience from a prevailing ‘passive spectator science' [4] to an approach investigating the dynamics of social exchange [4]. This requires that not only the experimental but also the control condition should preserve the reciprocity of real-time interactions.

Computer-animated on-screen agents have been used to study the influence of animacy on motor imitation [5] and mechanisms of joint attention [68], taking advantage of the extensive control the experimenter can have on their behaviour [9]. This includes, for example, control over the direction of the gaze, towards or away from a target, or its timing. Robots also have been used as control conditions in social neuroscience experiments [1013]. This article presents a novel second-person neuroscience paradigm for functional magnetic resonance imaging (fMRI) that uses a conversational robot as a control condition for a human social interaction. Social interaction is operationalized using language, the most ubiquitous form of human interaction. The paradigm allows the recording of brain activity during 1 min live bidirectional discussions between the scanned participant and a fellow human (human–human interaction; HHI) and similar discussions between the same participant and a conversational robot (human–robot interaction; HRI).

The HHI represents the experimental condition, constituting the ‘social' condition. The HRI represents the control condition, which preserves sensorimotor aspects of live, bidirectional conversation. Indeed, the robot has an anthropomorphic outer appearance, including a human face and voice, so that seeing, hearing and talking to the artificial agent is similar to the interaction with the human agent. In addition, while participants believe the robot is autonomous, it is actually controlled by the same individuals who interact with the participants in the HHI conditions. As a consequence, participants are not aware that they interact with the same individual, the confederate, in both HHI and HRI conditions (figure 1a). On the other hand, the conversational robot used in the experiment is clearly not human: the face is projected on a moulded plastic screen, it has a limited number of pre-scripted sentences for conversation and it does not exhibit meaningful facial expressions or speech intonations.

Figure 1.

Figure 1.

Experimental design showing (a) the communication between the scanned participant and the other conversation agent, either the confederate or the robot, as well as the recording modalities; (b) the timeline of the experiment showing the alternation between the stimuli and conversation periods, as well as the relative timing. The fruit pictures correspond to the images used in the cover story, while the robot and confederate pictures illustrate episodes of live bidirectional conversations.

A corpus of multimodal data is collected in addition to the fMRI data. Physiological responses (respiration and peripheral blood flow pulse) are recorded synchronized with the MR scanner and are used, currently, for modelling and removing physiological noise in the fMRI data. Behavioural data are recorded to enable future exploration of brain–behaviour relations. This includes speech production by the scanned participants and human and robot agent, the video capture of the human and robot agent and the gaze movement of the scanned participant. Given the unconstrained nature of the conversation task, a fine-grained exploration of the behaviour—in particular, transcription and analyses of conversations, and exploration of dynamic gaze direction to the human and robot agents' face—will be necessary to investigate brain dynamics in the corpus. Here, we focus on the block analysis by contrasting conditions HHI and HRI. Given that the robot control condition is designed to reproduce sensorimotor aspects of human conversation, both HHI and HRI are expected to be associated with a neural network involved in visuomotor speech perception and in speech production, including bilaterally the dorsal temporal lobes for speech perception and the ventral and lateral occipital cortex for face perception, as well as the bilateral ventral primary motor cortex (speech motor control) and the left inferior frontal gyrus (Broca's area) for speech production (see [14] for review).

The contrast between conditions HHI and HRI is used to test specific hypotheses about the neural correlates of social cognition, and hence to confirm the quality and validity of the acquired data. Social cognition [15] is broadly defined as ‘the sum of those processes that allow individuals of the same species (conspecifics) to interact with one another’. [16, p. R724]. On the basis of previous work [1013,17], we specifically expected processes of mentalizing and enhanced social motivation when interacting with the human compared to the robot. Mentalizing is the ascription of mental states such as intentions and beliefs to explain the apparent behaviour of the interaction partner [18]. It requires the adoption of an intentional stance towards the interaction partner—the assumption that the interacting agent actually has a mind supporting its mental states [19]. The adoption of an intentional stance towards a human versus a computer interaction partner has been linked to activation in the paracingulate cortex [17], a region of the medial prefrontal cortex (MPFC). It has been argued that humans do not adopt an intentional stance towards robots, computers and more generally artificial agents [19]. Indeed, increased activity in areas associated with mentalizing, not only in the MPFC but also the temporoparietal junction (TPJ), has been repeatedly found when interacting with a human compared to a robot or a computer [1013,17]. The contrast HHI versus HRI should activate the temporal and medial prefrontal areas associated with mentalizing.

Also, on the basis of previous results in experiments contrasting human versus robot interactions (e.g. [11,12]), we expected human interaction to elicit activation of neural markers of social motivation, the human drive to interact, establish and maintain bonds [20]. Chaminade et al. [12] report that a modulation of activity located in the paraventricular nucleus of the hypothalamus by the social context (human versus robot) is present in neurotypical but not in individuals diagnosed with autism spectrum disorder. This was associated with the proposal that autism is linked with a deficit in social motivation, involving disrupted hypothalamic regulation of oxytocin release [20]. Later studies confirmed the modulation of hypothalamus anatomy [21] and activity [22] by the social context. In general, social motivation and reward have been associated with brain activation in the reward circuit, comprising the ventral striatum, orbitofrontal and ventromedial cortex [20], including the amygdala specifically for social reward [23]. In line with these studies, we expected that the interaction with a human would activate the previously reported subcortical areas (in particular hypothalamus and amygdala) more than interaction with a robot. By contrast, we had no specific hypothesis with regard to brain activity in the reverse contrast HRI versus HHI.

In the following sections, we present the experimental paradigm and the first results of the reciprocal contrasts between conditions HHI and HRI, demonstrating not only the feasibility of our approach but also the scientific quality of the acquired data with regard to our hypotheses.

2. Methods

(a). Participants

Twenty-four native French-speaking participants (seven men) with an average age of 28.5 (s.d. = 12.4) years were fMRI scanned while having a conversation with a fellow human or a retroprojected conversational robotic head (Furhat robotics, https://www.furhatrobotics.com/; [24]). Three participants were excluded owing to technical problems and insufficient task compliance. Twenty-one participants (mean age = 25.81 years, s.d. = 7.49) were included in the analysis. Participants received information about the experiment, confirmed their compatibility for MR scanning and gave their informed consent prior to scanning. Eligibility entailed normal or corrected-to-normal vision and no history of psychiatric or neurological conditions. Participants received a flat fee of 40 Euro for participation. The study was approved by the ethics committee ‘Comité de Protection des Personnes Sud Méditerrannée I'.

(b). Cover story for the experiment

A recent behavioural study comparing human–human with human–robot conversations [25] was adapted to the fMRI environment. The experimental factor was the nature of the interacting agent (human versus robot), in a within-subject, block-design. A cover story was a fundamental element of the study, as it provided a fake rationale for the experiment as well as a frame for discussion and explanations for the experimental set-up. Volunteers were told that they were participating in a neuromarketing experiment sponsored by an advertising company. The company wanted to test whether the message of their forthcoming campaign could be identified during a discussion between two people about the presented images of the campaign. Two series of three images presented anthropomorphized fruits and vegetables as superheroes or appearing rotten, respectively (see electronic supplementary material, figure S1). Participants were instructed to talk freely about the presented image with the agent outside the scanner, either a human or the conversational robotic head (controlled by the confederate, unbeknown to the participant; §2c). The robot was a presented as an autonomous conversational agent that had information about the advertising campaign. As such, the discussion with the robot could be used to gather information about the advertising campaign.

In practice, the cover story was presented to the participants by experimenter BR in the lobby of the MR centre, later joined by the confederate. Confederates were gender-matched to participants. Experimenter TC served as confederate for men and experimenter MB for women. The participant was told that the confederate had already participated in the experiment inside the scanner and had agreed to come back to play the role of the agent outside of the scanner. The participant was then accompanied into the control room outside the scanner and shown the robot (see §2c). In the meantime, we asked the confederate to wait, telling him/her that we would first prepare the participant in the scanner. At the end of the experiment, participants were debriefed verbally to verify that they still believed in the cover story and we revealed the true objective of the experiment.

(c). Artificial agent

The robotic head from Furhat robotics (https://www.furhatrobotics.com/; [24]) was used in this study. The robotic head is a semi-transparent plastic mask moulded to mirror the shape of a human face on which the image of a human face is retro-projected. In order to match the robot's appearance to the confederates, the face and voice were gender-matched, a wig, a scarf and headphones were added as well as glasses for confederate TC (see illustrations in figure 1). Furhat OS allowed us to control its responses through a Wizard of Oz (WOZ): unbeknown to the participant, the confederate was controlling the robot remotely. The robot conversational feedbacks were largely based on actual human interactions recorded during the previous behavioural study [25]. A WOZ user interface was created with Furhat OS displaying buttons on a web browser running on a tablet allowing the human controller to launch pre-programmed conversational feedback. For example, clicking the button ‘yes' on the screen would make the robot say ‘yes', clicking the button ‘superhero' would launch the sentence ‘It looks like a superhero'. Conversational elements included non-specific feedbacks, such as ‘yes', ‘no’ or ‘maybe', which could be used for all images, as well as specific feedbacks for each of the images, such as ‘This lemon looks like a superhero' or ‘Maybe this is a campaign to eat healthier food'. Note that the cover story allowed the number of targeted conversations for each image to be limited compared to the unconstrained discussion. Overall about 30 French conversational feedbacks were scripted for the robot for each of the six images (see electronic supplementary material, File S1 for robot statements).

The robot was controlled using this WOZ interface by the confederate acting as conversational agent in the human condition, allowing for a realistic bidirectional conversation similar to the interaction with the human. Thus, unbeknown to the participants, they discussed with the same agent in both human and robot conditions. On the other hand, while the conversational robot was able to reproduce superficial aspects of a human conversation, it lacked speech intonation, head movements, facial expressions and the ability to elaborate longer statements, thus appearing clearly artificial and participants believed, according to debriefing, that it was autonomous.

(d). Experimental set-up

The fMRI audio set-up allowed live conversation between the scanned participant lying supine in the scanner and the agent outside of the scanner despite the noisy MRI environment. It consisted of an active noise-cancelling MR compatible microphone (FORMI-III+ from optoacoustics) mounted on the head coil and insert earphones from Sensimetrics. Live video of the interacting agent (human or robot) was captured by webcams and projected to a mirror mounted on the antenna in front of the scanned participant's eyes. Videos were recorded for future analysis. Participants' direction of gaze on the projection mirror was recorded (Eyelink 1000 system, SR Research). Stimulus presentation, audio and video routing and recording, synchronization with the fMRI acquisition triggers and the eye tracker were implemented in a Labview (National Instrument) virtual machine (figure 1a). Finally, blood pulse and respiration were recorded with built-in Prisma Siemens hardware and data format.

Altogether, we collected multimodal data including behaviour (speech from the participant and human or robot agent, video capture of the human and robot agent, and the gaze movement of the scanned participant) and physiology (blood oxygen level-dependent (BOLD) signal, respiration and peripheral blood flow pulse) to form a corpus. Transcribed speech data (more details on the transcription and an example of the conversation are provided as electronic supplementary material in §2.1 and Files S3–S5) and fMRI data, both raw and analysed, will be shared in online repositories.

(e). Experimental paradigm

The MRI recordings consisted of four sessions of each six 1 min blocks of conversation each, showing the ‘super-heroes' images in the first and third sessions and the ‘rotten fruits' images in the second and fourth sessions (see electronic supplementary material, figure S1 and table S1 for details). The order was kept constant across participants, each session alternating the three images per session and two interacting agents (complete order of conditions is given in electronic supplementary material, table S1). Each image was thus shown twice in each session, once per interacting agent. Given the entertaining nature of the interaction, we did not expect habituation effects to affect the brain imaging data and preferred to have the nature of the agent fully predictable. Hence, we did not randomize the order of presentation of the human and robot agents.

Blocks started with the presentation of one image for 8.3 s, followed by a 3.3 s black screen, after which there was a live bidirectional conversation with the interacting agent for 1 min, followed by an inter-block interval black screen of 4.6 s (figure 1b). In the absence of live video feed from inside the scanner, a light signalled to the confederate that the conversation had started. The participant initiated the conversation, instructed to talk freely with the other agent about the image and their suggestions on the topic of the advertising campaign. One block lasted 76.2 s and one session 8 min and 2 s of fMRI recording. We recorded 3 min of conversation per interacting agent and session, for a total of 24 min of conversation per participant. Audio and video set-up of the conversation was tested beforehand, and audio adjusted individually for each participant. As participants were always connected via audio with the confederate, some indicated that the sound level was not appropriate, thus giving us the chance to adapt the audio if required. This information was recorded for future use.

(f). Magnetic resonance imaging acquisition

MRI data were collected with a 3T Siemens Prisma (Siemens Medical, Erlangen, Germany) using a 20-channel head coil. BOLD sensitive functional images were acquired using an EPI sequence in the four runs. Parameters were as follows: echo time (TE) 30 ms, repetition time (TR) 1205 ms, flip angle 65°, 54 axial slices co-planar to the anterior/posterior commissure plane, field of view 210 mm × 210 mm, matrix size 84 × 84, voxel size 2.5 × 2.5 × 2.5 mm3, with multiband acquisition factor 3. After functional scanning, structural images were acquired with a GR_IR sequence (TE/TR 0.00228/2.4 ms, 320 sagittal slices, voxel size 0.8 × 0.8 × 0.8 mm3, field of view 204.8 × 256 × 256 mm).

(g). Magnetic resonance imaging data analysis

MRI data were analysed using SPM12 (Statistical Parametric Mapping, http://www.fil.ion.ucl.ac.uk/spm/). First, we calculated the voxel displacement map. The time series for each plane was then realigned temporally to the acquisition of the plane acquired at the middle of the image, both spatially and temporally, to correct for differences in plane time acquisition. The image time series were unwarped using the voxel-displacement map to take into account local distortion of the magnetic field and spatially realigned using a sinc interpolation algorithm that estimates rigid body transformations (translations, rotations). Images were then spatially smoothed using an isotropic 5 mm full-width-at-half-maximum Gaussian kernel. The first realigned and unwarped functional image was coregistered with an unwarped single-band reference image recorded at the onset of each trial, which was itself coregistered with the T1 and T2 anatomical images. These anatomical images were segmented into grey matter (GM), white matter (WM) and cerebral spinal fluid (CSF) using SPM12 ‘New segment'. GM, WM and CSF tissue probability maps from our sample of 21 included participants were used to form a DARTEL template [26]. The deformation flow fields from individual spaces to this template were used to normalize the beta images resulting from the individual subjects' analyses (i.e. in subjects' individual space) for use in a random-effect second-level analysis.

Potential artefacts from blood pulse and respiration were controlled using the Translational Algorithms for Psychiatry-Advancing Science (TAPAS) toolbox standard procedure (https://www.tnu.ethz.ch/de/software/tapas/documentations/physio-toolbox.html; [27]). Realignment parameters (translation and rotation), as well as their derivatives and the square product of both parameters and their derivatives, were used as covariates to control for movement-related artefacts. We also used the artefact detection tools (ART) to control for any movement-related artefacts (www.nitrc.org/projects/artifact_detect/) using the standard threshold of 2 mm.

The fMRI time series were analysed using the general linear model approach implemented in SPM. Single-subject models consisted of one regressor representing the 1 min discussion for each of the two interacting agents, and another one representing the presentation of the images.

After normalization, beta estimates images were entered in a mixed-model analysis of variance (using SPM ‘full ANOVA') with participants and sessions as random factors and the nature of the interacting agent as factor of interest for inferences at the population level. A mask was created on the basis of the mean of DARTEL normalized anatomical GM and WM tissue classes of each participant, also used for rendering results in figures 2 and 3.

Figure 2.

Figure 2.

Render of the brain surface of the mean of the coregistered and normalized brains from our participants sample. Overlaid are the results of the contrasts of interest (p < 0.05 false-discovery rate (FDR)-corrected at the cluster level). Upper row shows the contrast of the human–human interaction (HHI) versus baseline in blue, and of the human–robot interaction (HRI) versus baseline in red. Lower row shows the contrast HHI versus HRI in blue, and HRI versus HHI in red.

Figure 3.

Figure 3.

Coronal (top images), sagittal (middle images) and axial (bottom images) sections focusing on the cluster identifying subcortical structures significantly activated in HHI versus HRI.

We first assessed the main effect of the conversation with both agents against the implicit baseline. We then looked specifically at the effects of each of the interacting agent contrasted to the other one, with a clear focus on brain areas involved in mentalizing and social motivation in the contrast HHI versus HRI.

All statistical inference was performed applying a threshold of p = 0.05 false-discovery rate (FDR)-corrected for the whole brain at the cluster level [28]. Anatomical localization of the resulting clusters relied on the projection of the results onto the mean anatomical image of our pool participants resulting from DARTEL coregistration.

3. Results

(a). Cover story debriefing

A verbal debriefing was performed in an undirected and open format, to allow the participants to report their experience in an unbiased manner. None of the participants reported feelings of distress during the experiment with either interaction partner, or doubts about the autonomous nature of the conversational robot. In conclusion, all participants still believed in the cover story at the end of the recordings.

(b). Assessment of participants' movements during scanning

No participant was excluded on the basis of the assessment of movement using the toolbox ART (https://www.nitrc.org/projects/artifact_detect/). At the movement threshold used, between 0 and a maximum of 3 images per session and participant were considered as outliers. In the absence of large artefacts, all scans and sessions from the 21 participants were included in the analysis. Moreover, using the same metric to calculate a global movement per block of discussion, session and subject, an analysis of variance showed no difference in motion between the two interacting agents (F1,495 = 2.22, p = 0.14; see electronic supplementary material, figure S2).

(c). Participants' behaviour

The full transcription of the 504 min of discussion collected for the corpus is ongoing (examples, as well as the link to the data repository, are presented in electronic supplementary material; see Files S3–S5). Yet is has been observed by the confederates that discussions between the two agents differed in terms of the speed and emotion conveyed by the participant's voice. Participants spoke in general faster and with increased prosodic variations with the human than the robot agent. Humour was also observed in the conversation with the human, but not with the robot. These observations are expected given the differences in conversational competence between the two agents.

(d). Functional magnetic resonance imaging results

The main effect of conversation for the human and for the robot largely overlapped (figure 2, top). As predicted given the nature of the task, common activation clusters are found bilaterally along the superior temporal sulcus and gyrus, the central operculum, the lateral and ventral occipital cortex, the lateral premotor cortex, the supplementary motor area and the ventral and dorsal cerebellum, as well as the left inferior frontal gyrus. Differences between the resulting activation maps for the human and robot agents were quantitative rather than qualitative, with larger clusters mostly related to motor control (in region of the precentral and postcentral gyri) for the robot and to speech processing (in the temporal cortex) for the human.

The contrast of HHI versus HRI (see figure 2, bottom left) revealed bilateral activation in the superior temporal gyrus and sulcus that overlapped partly with the temporal areas associated with the main effects of the conversations. It extended anteriorly to the temporal poles and to the posterior lateral orbitofrontal cortex. Posteriorly, it covered the TPJ and lateral occipital cortex. Another significant cluster covered a number of subcortical structures: the bilateral thalamus, hypothalamus, hippocampus, amygdala, caudate nucleus and the subthalamic area. We also found bilateral activation in the cerebellum centred on the horizontal fissure. No medial prefrontal cluster was found at the threshold used.

The reverse contrast HRI versus HHI identified a number of bilateral activation clusters. In the occipital region, a cluster centred on the striate cortex extended to the lingual and fusiform gyri. Furthermore, a strong activation was found bilaterally within the intraparietal sulcus extending to the supramarginal gyrus. Clusters were also found in the middle frontal gyrus and the centred on the lateral central sulcus.

4. Discussion

We introduce a novel paradigm to investigate the neural bases of natural interactions between humans, in line with a second-person neuroscience approach. We choose live bidirectional conversation as operationalization of natural interactions given that it is the most common form of communication between humans. The scientific challenge is twofold.

Methodologically, investigating natural interactions implies that the classical experimental approach, in which only one parameter is changed between experimental conditions, is not applicable. Here, we use a robot for a high-level control condition: the conversational robot reproduces a number of sensorimotor aspects of the conversation, yet is far from mimicking a real human, and it does not elicit the adoption of an intentional stance according to Dennett [19]. Hence, interacting with the robot in the current paradigm can be considered to be non-social, yielding a unique control condition for the social interaction with a fellow human.

Technically, the constraints of MRI recordings are numerous for a live bidirectional conversation during fMRI scanning: participants lie supine in a very noisy environment and are required to avoid any movement to ensure the quality of the data. We decided to hold the head firmly using foam pads while keeping the jaw free. Importantly, post hoc assessment of individual participants' movements showed very limited motion and no quantitative difference between the human and robot condition, confirming the feasibility of the task.

The main objective of the analysis presented in the present article is to evaluate the quality of the recorded fMRI data, the main part of a unique corpus of neural, physiological and behavioural data. We have strong hypotheses about brain responses expected to be common during conversation with the two agents, as well as for the difference between interaction with the human versus the robot.

(a). Commonly activated areas

We report a large number of common activated areas in the main effects of HHI and HRI that can be directly related to sensorimotor aspects of the conversation. As expected, they cover the dorsal half of the posterior temporal cortex bilaterally, known as the main brain region for auditory speech perception, comprising functional areas such as the primary auditory cortex or temporal voice areas [29,30]. Common activations are also found in motor-related areas that are involved in the motor aspects of speech production. In particular, the ventral and opercular region below the central suclus and adjacent precentral and postcentral gyrii is likely to include primary motor and sensory regions involved in verbalization (e.g. [14]), while the lateral cluster in the central sulcus area maps into the sensorimotor representation of the larynx [31]. The lateralized inferior frontal gyrus corresponds to Broca's area, crucial for the production of speech. The medial premotor areas and the cerebellum are generally associated with the timing of action, which is crucial for articulation (see for review [14]). Note that these motor areas could also be involved in speech perception according to the motor theory of speech perception [32]. Indeed, a recent study revealed correlated activation in the temporal auditory areas and the inferior frontal gyrus during successful coupling between a speaker and a listener during a delayed interaction [33]. Current results show that live bidirectional conversation, irrespective of the agent, activates a network of brain regions previously associated with speech perception and production. Unfortunately, speech production and perception can't be distinguished in the current analysis, but will be the object of future exploration of this corpus. Finally, the large cluster spanning the lateral and ventral occipital cortex most probably responds to the processing of visual information, namely the face of the human or robot agent talking.

(b). Areas of increased activation in the human–human interaction condition

Interaction with a fellow human, as compared to the robot, revealed activation in the temporal cortex, including the bilateral TPJ, and subcortical activation in the hypothalamus, the thalamus, the hippocampus, the amygdala and the subthalamic area. These results are in line with our predictions, except for the absence of activation in the anterior medial frontal cortex. Activation in the TPJ and hypothalamus has been reported in previous studies comparing human to robot interaction [1113]. TPJ activation has recently been reported when explicitly ascribing human intention to robot behaviour [34]. Hypothalamus activation during HHI versus HRI was linked to enhanced social motivation [12,20], given the release of oxytocin by hypothalamus subnuclei [35,36]. The amygdala has specifically been related to social when compared with monetary reward [23]. It is a key neural node in the processing of emotionally and socially relevant information, coding saliency, reward and value of social stimuli [37].

(c). Areas of increased activation in the human–robot interaction condition

The contrast HRI versus HHI showed significant activation in visual areas, including the fusiform gyrus, hosting the fusiform face area (FFA), in the intraparietal sulcus (IPS) and in anterior parts of the middle frontal gyrus (MFG). We did not have specific hypotheses for the effect of HRI compared to HHI, so all interpretation remains speculative. The finding of enhanced activity in an area prominently involved in human face perception (FFA) has been previously reported for action observation comparing robotic to human movements [38], and, alongside enhanced activation in visual areas, for the perception of robot compared to human faces [10]. This has been interpreted as additional visual processing effort to identify an unfamiliar, robotic face [10]. Interestingly, the intraparietal sulcus was associated with the ‘uncanny valley’ effect [39], and interpreted as reflecting an increase of attention towards unfamiliar stimuli. Enhanced response in visual areas, the IPS and the MFG seems in line with studies investigating mechanistic versus social reasoning [40] and ratings of images depicting machines (including robots) versus humans [41].

Overall, we largely confirmed our hypotheses for brain activation in response to a human compared to robot conversation. Our findings confirm that processes of mentalizing and social motivation are enhanced in our paradigm when interacting with a human rather than with a robot. These results further confirm the quality and validity of the brain imaging data recorded, the main part of the corpus also including behavioural and physiological data collected with the approach presented in this paper.

5. Limitations

We present an approach towards truly reciprocal, interactive social neuroscience and first supporting neurophysiological results. One major concern in fMRI studies involving language is the risk of extensive movement artefacts induced by motoric aspects of speech-production. Yet, in the present study, we observed hardly any speech-induced movement during recording (see electronic supplementary material, figure S2).

The pre-scripted sentences of the robot, were shorter and more limited than the human's. The robot's intonation, and head and face movements more generally, were not controlled in the current experiment. Thus, it is expected that the human conversation differed from the robot conversation. This is likely to explain some of the differences in brain activity reported here.

The univariate fMRI analysis presented here is not sufficient to investigate the complex dynamics of the interactions. The corpus collected contained not only fMRI but also behavioural (linguistic, eye tracking of the participant, video of the other agent during the interaction) and physiological (respiration and blood pulse) data. Future work on the corpus will entail fine-grained description of the behaviour, which will fuel the analysis of fMRI data. Transcription of speech recordings is underway (see supplementary information for an example of transcription) and will be made publicly available together with the fMRI data.

Also, future studies should include explicit measures of the perception of robots in general, and of the conversational robot used in the experiment more specifically, in the form of questionnaires that would provide insights about individuals' variations in their expectations about the robot's capacity.

6. Conclusion

We investigated natural interaction comparing HHI and HRI using fMRI. Using a conversational robot as the control condition allowed us to preserve reciprocal dynamics during the interaction. Results for HHI showed activity in brain areas associated with mentalizing and social motivation. The article introduces an innovative paradigm in a second-person neuroscience approach. As such, it could be used as a starting point for social neuroscience to investigate specificities of human social cognition as well as to quantify, and thus participate in the improvement of, the social competence of robots interacting with humans.

Supplementary Material

Supplementary Information
rstb20180033supp1.docx (503.3KB, docx)

Data accessibility

Transcribed linguistic data can be found on Ortolang (https://www.ortolang.fr): https://hdl.handle.net/11403/convers; fMRI group data on Neurovault (https://neurovault.org/): /collections/ASGXRWEM/; fMRI raw data can be found on OpenNeuro (https://openneuro.org/): https://openneuro.org/datasets/ds001740.

Competing interests

We declare we have no competing interests.

Funding

Research supported by grants ANR-16-CONV-0002 (ILCB), ANR-11-LABX-0036 (BLRI) and AAP-ID-17-46-170301-11.1 by the Excellence Initiative of Aix-Marseille University (A*MIDEX), a French “Investissement d'Avenir” programme. B.R. is supported by the Fondation pour la Recherche Médicale (FRM, SPF20171039127).

References

  • 1.Schilbach L, Timmermans B, Reddy V, Costall A, Bente G, Schlicht T, Vogeley K. 2013. Toward a second-person neuroscience. Behav. Brain Sci. 36, 393–414. ( 10.1017/S0140525X12000660) [DOI] [PubMed] [Google Scholar]
  • 2.Schilbach L. 2015. Eye to eye, face to face and brain to brain: novel approaches to study the behavioral dynamics and neural mechanisms of social interactions. Curr. Opin. Behav. Sci. 3, 130–135. ( 10.1016/j.cobeha.2015.03.006) [DOI] [Google Scholar]
  • 3.Pan X, Hamilton AF. 2018. Why and how to use virtual reality to study human social interaction: the challenges of exploring a new research landscape. Br. J. Psychol. 109, 395–417. ( 10.1111/bjop.12290) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hari R, Henriksson L, Malinen S, Parkkonen L. 2015. Centrality of social interaction in human brain function. Neuron 88, 181–193. ( 10.1016/j.neuron.2015.09.022) [DOI] [PubMed] [Google Scholar]
  • 5.Klapper A, Ramsey R, Wigboldus D, Cross ES. 2014. The control of automatic imitation based on bottom–up and top–down cues to animacy: insights from brain and behavior. J. Cogn. Neurosci. 26, 2503–2513. ( 10.1162/jocn_a_00651) [DOI] [PubMed] [Google Scholar]
  • 6.Schilbach L, Wohlschlaeger AM, Kraemer NC, Newen A, Shah NJ, Fink GR, Vogeley K. 2006. Being with virtual others: neural correlates of social interaction. Neuropsychologia 44, 718–730. ( 10.1016/j.neuropsychologia.2005.07.017) [DOI] [PubMed] [Google Scholar]
  • 7.Schilbach L, Wilms M, Eickhoff SB, Romanzetti S, Tepest R, Bente G, Vogeley K. 2010. Minds made for sharing: initiating joint attention recruits reward-related neurocircuitry. J. Cogn. Neurosci. 22, 2702–2715. ( 10.1162/jocn.2009.21401) [DOI] [PubMed] [Google Scholar]
  • 8.Wilms M, Schilbach L, Pfeiffer U, Bente G, Fink GR, Vogeley K. 2010. It's in your eyes—using gaze-contingent stimuli to create truly interactive paradigms for social cognitive and affective neuroscience. Social Cogn. Affect. Neurosci. 5, 98–107. ( 10.1093/scan/nsq024) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hale J, Hamilton AFDC. 2016. Testing the relationship between mimicry, trust and rapport in virtual reality conversations. Scientific Reports 6, 35295 ( 10.1038/srep35295) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chaminade T, Zecca M, Blakemore S-J, Takanishi A, Frith CD, Micera S, Umiltà MA. 2010. Brain response to a humanoid robot in areas implicated in the perception of human emotional gestures. PLoS ONE 5, e11577 ( 10.1371/journal.pone.0011577) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chaminade T, Rosset D, Da Fonseca D, Nazarian B, Lutscher E, Cheng G, Deruelle C. 2012. How do we think machines think? An fMRI study of alleged competition with an artificial intelligence. Front. Hum. Neurosci. 6, 103 ( 10.3389/fnhum.2012.00103) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chaminade T, Da Fonseca D, Rosset D, Cheng G, Deruelle C. 2015. Atypical modulation of hypothalamic activity by social context in ASD. Res. Autism Spectr. Disord. 10, 41–50. ( 10.1016/j.rasd.2014.10.015) [DOI] [Google Scholar]
  • 13.Krach S, Hegel F, Wrede B, Sagerer G, Binkofski F, Kircher T. 2008. Can machines think? Interaction and perspective taking with robots investigated via fMRI. PLoS ONE 3, e2597 ( 10.1371/journal.pone.0002597) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Price CJ. 2012. A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading. Neuroimage 62, 816–847. ( 10.1016/j.neuroimage.2012.04.062) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Adolphs R. 1999. Social cognition and the human brain. Trends Cogn. Sci. 3, 469–479. ( 10.1016/S1364-6613(99)01399-6) [DOI] [PubMed] [Google Scholar]
  • 16.Frith CD, Frith U. 2007. Social cognition in humans. Curr. Biol. 17, R724–R732. ( 10.1016/j.cub.2007.05.068) [DOI] [PubMed] [Google Scholar]
  • 17.Gallagher HL, Jack AI, Roepstorff A, Frith CD. 2002. Imaging the intentional stance in a competitive game. Neuroimage 16, 814–821. ( 10.1006/nimg.2002.1117) [DOI] [PubMed] [Google Scholar]
  • 18.Frith CD, Frith U. 1999. Interacting minds--a biological basis. Science 286, 1692–1695. ( 10.1126/science.286.5445.1692) [DOI] [PubMed] [Google Scholar]
  • 19.Dennett DC. 1989. The intentional stance. Cambridge, MA: MIT press. [Google Scholar]
  • 20.Chevallier C, Kohls G, Troiani V, Brodkin ES, Schultz RT. 2012. The social motivation theory of autism. Trends Cogn. Sci. 16, 231–239. ( 10.1016/j.tics.2012.02.007) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wolfe FH, Auzias G, Deruelle C, Chaminade T. 2015. Focal atrophy of the hypothalamus associated with third ventricle enlargement in autism spectrum disorder. Neuroreport 26, 1017–1022. ( 10.1097/WNR.0000000000000461) [DOI] [PubMed] [Google Scholar]
  • 22.Wolfe FH, Deruelle C, Chaminade T. 2018. Are friends really the family we choose? Local variations of hypothalamus activity when viewing personally known faces. Social Neurosci. 13, 289–300. ( 10.1080/17470919.2017.1317662) [DOI] [PubMed] [Google Scholar]
  • 23.Rademacher L, Krach S, Kohls G, Irmak A, Gründer G, Spreckelmeyer KN. 2010. Dissociation of neural networks for anticipation and consumption of monetary and social rewards. Neuroimage 49, 3276–3285. ( 10.1016/j.neuroimage.2009.10.089) [DOI] [PubMed] [Google Scholar]
  • 24.Al Moubayed S, Beskow J, Skantze G, Granström B. 2012. Furhat: a back-projected human-like robot head for multiparty human–machine interaction. In Cognitive behavioural systems (eds A Esposito, AM Esposito, A Vinciarelli, R Hoffmann, VC Müller), pp. 114–130. Berlin, Germany: Springer; ( 10.1007/978-3-642-34584-5_9) [DOI] [Google Scholar]
  • 25.Chaminade T. 2017. An experimental approach to study the physiology of natural social interactions. Interact. Stud. 18, 254–275. ( 10.1075/is.18.2.06gry) [DOI] [Google Scholar]
  • 26.Ashburner J. 2007. A fast diffeomorphic image registration algorithm. Neuroimage 38, 95–113. ( 10.1016/j.neuroimage.2007.07.007) [DOI] [PubMed] [Google Scholar]
  • 27.Kasper L, Bollmann S, Diaconescu AO, Hutton C, Heinzle J, Iglesias S, Stephan KE. 2017. The PhysIO toolbox for modeling physiological noise in fMRI data. J. Neurosci. Methods 276, 56–72. ( 10.1016/j.jneumeth.2016.10.019) [DOI] [PubMed] [Google Scholar]
  • 28.Friston KJ, Holmes A, Poline J-B, Price CJ, Frith CD. 1996. Detecting activations in PET and fMRI: levels of inference and power. Neuroimage 4, 223–235. ( 10.1006/nimg.1996.0074) [DOI] [PubMed] [Google Scholar]
  • 29.Belin P, Fecteau S, Bedard C. 2004. Thinking the voice: neural correlates of voice perception. Trends Cogn. Sci. 8, 129–135. ( 10.1016/j.tics.2004.01.008) [DOI] [PubMed] [Google Scholar]
  • 30.Belin P, Zatorre RJ, Lafaille P, Ahad P, Pike B. 2000. Voice-selective areas in human auditory cortex. Nature 403, 309 ( 10.1038/35002078) [DOI] [PubMed] [Google Scholar]
  • 31.Brown S, Laird AR, Pfordresher PQ, Thelen SM, Turkeltaub P, Liotti M. 2009. The somatotopy of speech: phonation and articulation in the human motor cortex. Brain Cogn. 70, 31–41. ( 10.1016/j.bandc.2008.12.006) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Galantucci B, Fowler CA, Turvey MT. 2006. The motor theory of speech perception reviewed. Psychon. Bull. Rev. 13, 361–377. ( 10.3758/BF03193857) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Stephens GJ, Silbert LJ, Hasson U. 2010. Speaker–listener neural coupling underlies successful communication. Proc. Natl Acad. Sci. USA 107, 14 425–14 430. ( 10.1073/pnas.1008662107) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Özdem C, Wiese E, Wykowska A, Müller H, Brass M, Van Overwalle F. 2017. Believing androids–fMRI activation in the right temporo-parietal junction is modulated by ascribing intentions to non-human agents. Social Neurosci. 12, 582–593. ( 10.1080/17470919.2016.1207702) [DOI] [PubMed] [Google Scholar]
  • 35.Bartz JA, Zaki J, Bolger N, Ochsner KN. 2011. Social effects of oxytocin in humans: context and person matter. Trends Cogn. Sci. 15, 301–309. ( 10.1016/j.tics.2011.05.002) [DOI] [PubMed] [Google Scholar]
  • 36.Heinrichs M, von Dawans B, Domes G. 2009. Oxytocin, vasopressin, and human social behavior. Front. Neuroendocrinol. 30, 548–557. ( 10.1016/j.yfrne.2009.05.005) [DOI] [PubMed] [Google Scholar]
  • 37.Adolphs R. 2010. What does the amygdala contribute to social cognition? Ann. N Y Acad. Sci. 1191, 42–61. ( 10.1111/j.1749-6632.2010.05445.x) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Cross ES, Ramsey R, Liepelt R, Prinz W, Hamilton AFdC. 2016. The shaping of social perception by stimulus and knowledge cues to human animacy. Phil. Trans. R. Soc. B 371, 20150075 ( 10.1098/rstb.2015.0075) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Saygin AP, Chaminade T, Ishiguro H. 2010. The perception of humans and robots: uncanny hills in parietal cortex. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 32, 2716–2720. Retrieved from https://escholarship.org/uc/item/71m6d8bk. [Google Scholar]
  • 40.Jack AI, Dawson AJ, Begany KL, Leckie RL, Barry KP, Ciccia AH, Snyder AZ. 2013. fMRI reveals reciprocal inhibition between social and physical cognitive domains. Neuroimage 66, 385–401. ( 10.1016/j.neuroimage.2012.10.061) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Jack AI, Dawson AJ, Norr ME. 2013. Seeing human: distinct and overlapping neural signatures associated with two forms of dehumanization. Neuroimage 79, 313–328. ( 10.1016/j.neuroimage.2013.04.109) [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information
rstb20180033supp1.docx (503.3KB, docx)

Data Availability Statement

Transcribed linguistic data can be found on Ortolang (https://www.ortolang.fr): https://hdl.handle.net/11403/convers; fMRI group data on Neurovault (https://neurovault.org/): /collections/ASGXRWEM/; fMRI raw data can be found on OpenNeuro (https://openneuro.org/): https://openneuro.org/datasets/ds001740.


Articles from Philosophical Transactions of the Royal Society B: Biological Sciences are provided here courtesy of The Royal Society

RESOURCES